If the tech industry’s top AI models had superlatives, Microsoft-backed OpenAI’s GPT-4 would be best at math, Meta‘s Llama 2 would be most middle of the road, Anthropic’s Claude 2 would be best at knowing its limits and Cohere AI would receive the title of most hallucinations — and most confident wrong answers.
That’s all according to a Thursday report from researchers at Arthur AI, a machine learning monitoring platform.
The research comes at a time when misinformation stemming from artificial intelligence systems is more hotly debated than ever, amid a boom in generative AI ahead of the 2024 U.S. presidential election.
It’s the first report “to take a comprehensive look at rates of hallucination, rather than just sort of … provide a single number that talks about where they are on an LLM leaderboard,” Adam Wenchel, co-founder and CEO of Arthur, told CNBC.
AI hallucinations occur when large language models, or LLMs, fabricate information entirely, behaving as if they are spouting facts. One example: In June, news broke that ChatGPT cited “bogus” cases in a New York federal court filing, and the New York attorneys involved may face sanctions.
In one experiment, the Arthur AI researchers tested the AI models in categories such as combinatorial mathematics, U.S. presidents and Moroccan political leaders, asking questions “designed to contain a key ingredient that gets LLMs to blunder: they demand multiple steps of reasoning about information,” the researchers wrote.
Overall, OpenAI’s GPT-4 performed the best of all models tested, and researchers found it hallucinated less than its prior version, GPT-3.5 — for example, on math questions, it hallucinated between 33% and 50% less. depending on the category.
Meta’s Llama 2, on the other hand, hallucinates more overall than GPT-4 and Anthropic’s Claude 2, researchers found.
In the math category, GPT-4 came in first place, followed closely by Claude 2, but in U.S. presidents, Claude 2 took the first place spot for accuracy, bumping GPT-4 to second place. When asked about Moroccan politics, GPT-4 came in first again, and Claude 2 and Llama 2 almost entirely chose not to answer.
In a second experiment, the researchers tested how much the AI models would hedge their answers with warning phrases to avoid risk (think: “As an AI model, I cannot provide opinions”).
When it comes to hedging, GPT-4 had a 50% relative increase compared to GPT-3.5, which “quantifies anecdotal evidence from users that GPT-4 is more frustrating to use,” the researchers wrote. Cohere’s AI model, on the other hand, did not hedge at all in any of its responses, according to the report. Claude 2 was most reliable in terms of “self-awareness,” the research showed, meaning accurately gauging what it does and doesn’t know, and answering only questions it had training data to support.
The most important takeaway for users and businesses, Wenchel said, was to “test on your exact workload,” later adding, “It’s important to understand how it performs for what you’re trying to accomplish.”
“A lot of the benchmarks are just looking at some measure of the LLM by itself, but that’s not actually the way it’s getting used in the real world,” Wenchel said. “Making sure you really understand the way the LLM performs for the way it’s actually getting used is the key.”
Sundar Pichai, CEO of Alphabet Inc., during Stanford’s 2024 Business, Government, and Society forum in Stanford, California, April 3, 2024.
Justin Sullivan | Getty Images
Google is going to spend $10 billion more this year than it previously expected due to the growing demand for cloud services, which has created a backlog, executives said Wednesday.
As part of its second quarter earnings, the company increased its forecast for capital expenditures in 2025 to $85 billion due to “strong and growing demand for our Cloud products and services” as it continues to expand infrastructure to power more AI services that use its cloud technology. That’s up from the $75 billion projection that Google provided in February, which was already above the $58.84 billion that Wall Street expected at the time.
The increased forecast comes as demand for cloud services surges across the tech industry as AI services increase in popularity. As a result, companies are doubling down on infrastructure to keep pace with demand and are planning multi‑year buildouts of data centers.
In its second quarter earnings, Google reported that cloud revenues increased by 32% to $13.6 billion in the period. The demand is so high for Google’s cloud services that it now amounts to a $106 billion backlog, Alphabet finance chief Anat Ashkenazi said during the company’s post-earnings conference call.
“It’s a tight supply environment,” she said.
The vast majority of Alphabet’s capital spend was invested in technical infrastructure during the second quarter, with approximately two-thirds of investments going to servers and one-third in data center and networking equipment, Ashkenazi said.
She added that the updated outlook reflects additional investment in servers, the timing of delivery of servers and “an acceleration in the pace of data center construction, primarily to meet Cloud customer demand.”
Ashkenazi said that despite the company’s “improved” pace of getting servers up and running, investors should expect further increase in capital spend in 2026 “due to the demand as well as growth opportunities across the company.” She didn’t specify what those opportunities are but said the company will provide more details on a future earnings call.
“We’re increasing capacity with every quarter that goes by,” Ashkenazi said.
Due to the increased spend, Google will have to record more expenses over time, which will make profits look smaller, she said.
“Obviously, we’re working hard to bring more capacity online,” Ashkenazi said.
The SK Hynix Inc. logo is displayed on a glass door at the company’s office in Seoul, South Korea, on Monday, Jan. 27, 2014. SK Hynix aims to select a U.S. site for its advanced chip packaging plant and break ground there around the first quarter of next year.
SeongJoon Cho | Bloomberg | Getty Images
South Korea’s SK Hynix on Thursday posted record operating profit and revenuein the second quarter on sustained demand for its high bandwidth memory technology used in generative AI chipsets.
Here are SK Hynix’s second-quarter results compared with LSEG SmartEstimates, which are weighted toward forecasts from analysts who are more consistently accurate:
Revenue: 22.23 trillion won ($16.17 billion) vs. 20.56 trillion won
Operating profit: 9.21 trillion won vs. 9 trillion won
Revenue rose about 35% in the June quarter compared with the same period a year earlier, while operating profit rose nearly 69%, year on year.
On a quarter-on-quarter basis, revenue rose 26%, while operating profit jumped 24%.
The company said in a statement that it enjoyed strong demand and favorable pricing conditions in the first half of the year. SK Hynix added that there was a low likelihood of sharp demand corrections for the rest of 2025, due to stable customer inventory levels and expected demand from new product launches.
SK Hynix is a leading supplier of dynamic random access memory — a type of semiconductor memory commonly found in PCs, workstations and servers that is used to store data and program code.
Much of the company’s recent success can be credited to its business in high bandwidth memory, or HBM — a type of DRAM used in artificial intelligence servers.
SK Hynix has established itself as the global leader in HBM, supplying clients such as U.S. AI darling Nvidia. In the first quarter, this had seen the company overtake rival Samsung Electronics in the global DRAM market for the first time, according to Counterpoint Research.
A report from Counterpoint Research earlier this month estimated that SK Hynix had tied Samsung’s combined DRAM and NAND revenues in the second quarter, with both vying for the top position in the global memory market. NAND is a type of flash memory that is commonly used in storage devices.
Samsung and US.-based memory maker Micron Technology are both seeking to catch up to SK Hynix in the HBM space. However, analysts expect SK Hynix’s dominance to persist in the short-term.
“As of now, I believe SK Hynix still holds its leadership in the HBM race … despite Samsung’s and Micron’s catch‑up efforts,” said Ray Wang, research director of semiconductors, supply chain and emerging technology at The Futurum Group.
“I expect this edge to persist through the rest of 2025 and extend into 2026,” he added.
IBM CEO Arvind Krishna appears at the World Economic Forum in Davos, Switzerland, on Jan. 16, 2024.
Stefan Wermuth | Bloomberg | Getty Images
IBM shares fell as much as 5% in extended trading on Wednesday after the tech conglomerate issued second-quarter results that topped Wall Street projections.
Here’s how the company did in comparison with LSEG consensus:
Earnings per share: $2.80 adjusted vs. $2.64 expected
Revenue: $16.98 billion vs. $16.59 billion
IBM’s revenue increased nearly 8% year over year in the quarter, according to a statement. Growth in the first quarter was below 1%. Net income, which includes costs related to acquisitions, rose to $2.19 billion, or $2.31 per share, from $1.83 billion, or $1.96 per share, a year ago.
Software revenue climbed about 10% to $7.39 billion, exceeding the $7.43 billion consensus among analysts surveyed by StreetAccount. Hybrid cloud revenue, including Red Hat, showed 16% growth. The software unit’s gross margin of 83.9% was barely narrower than StreetAccount’s 84.0% consensus.
Revenue from consulting rose almost 3% to $5.31 billion, higher than StreetAccount’s $5.16 billion consensus. Infrastructure revenue went up 14% to $4.14 billion, above the $3.75 billion StreetAccount average estimate.
During the quarter, IBM announced the next-generation z17 mainframe computer and the acquisition of data and artificial intelligence consulting firm Hakkoda.
IBM called for over $13.5 billion in 2025 free cash flow, similar to a projection from April. The company still sees at least 5% revenue growth at constant currency for the year.
As of Wednesday’s close, IBM shares were up 28% so far in 2025, while the S&P 500 index has gained around 8% in the same period.
Executives will discuss the results with analysts on a conference call starting at 5 p.m. ET.
This is breaking news. Please check back for updates.