If the tech industry’s top AI models had superlatives, Microsoft-backed OpenAI’s GPT-4 would be best at math, Meta‘s Llama 2 would be most middle of the road, Anthropic’s Claude 2 would be best at knowing its limits and Cohere AI would receive the title of most hallucinations — and most confident wrong answers.
That’s all according to a Thursday report from researchers at Arthur AI, a machine learning monitoring platform.
The research comes at a time when misinformation stemming from artificial intelligence systems is more hotly debated than ever, amid a boom in generative AI ahead of the 2024 U.S. presidential election.
It’s the first report “to take a comprehensive look at rates of hallucination, rather than just sort of … provide a single number that talks about where they are on an LLM leaderboard,” Adam Wenchel, co-founder and CEO of Arthur, told CNBC.
AI hallucinations occur when large language models, or LLMs, fabricate information entirely, behaving as if they are spouting facts. One example: In June, news broke that ChatGPT cited “bogus” cases in a New York federal court filing, and the New York attorneys involved may face sanctions.
In one experiment, the Arthur AI researchers tested the AI models in categories such as combinatorial mathematics, U.S. presidents and Moroccan political leaders, asking questions “designed to contain a key ingredient that gets LLMs to blunder: they demand multiple steps of reasoning about information,” the researchers wrote.
Overall, OpenAI’s GPT-4 performed the best of all models tested, and researchers found it hallucinated less than its prior version, GPT-3.5 — for example, on math questions, it hallucinated between 33% and 50% less. depending on the category.
Meta’s Llama 2, on the other hand, hallucinates more overall than GPT-4 and Anthropic’s Claude 2, researchers found.
In the math category, GPT-4 came in first place, followed closely by Claude 2, but in U.S. presidents, Claude 2 took the first place spot for accuracy, bumping GPT-4 to second place. When asked about Moroccan politics, GPT-4 came in first again, and Claude 2 and Llama 2 almost entirely chose not to answer.
In a second experiment, the researchers tested how much the AI models would hedge their answers with warning phrases to avoid risk (think: “As an AI model, I cannot provide opinions”).
When it comes to hedging, GPT-4 had a 50% relative increase compared to GPT-3.5, which “quantifies anecdotal evidence from users that GPT-4 is more frustrating to use,” the researchers wrote. Cohere’s AI model, on the other hand, did not hedge at all in any of its responses, according to the report. Claude 2 was most reliable in terms of “self-awareness,” the research showed, meaning accurately gauging what it does and doesn’t know, and answering only questions it had training data to support.
The most important takeaway for users and businesses, Wenchel said, was to “test on your exact workload,” later adding, “It’s important to understand how it performs for what you’re trying to accomplish.”
“A lot of the benchmarks are just looking at some measure of the LLM by itself, but that’s not actually the way it’s getting used in the real world,” Wenchel said. “Making sure you really understand the way the LLM performs for the way it’s actually getting used is the key.”
Tim Cook, chief executive officer of Apple Inc., during the Apple Worldwide Developers Conference (WWDC) at Apple Park campus in Cupertino, California, US, on Monday, June 9, 2025.
Apple said the redesigned feature is coming to some Apple Watch Series 9, Series 10, and Apple Watch Ultra 2 users on Thursday. The update was possible because of a recent U.S. Customs ruling, the company said.
In 2023, the International Trade Commission found that Apple’s blood oxygen sensors infringed on intellectual property from Masimo, a medical technology company. Apple paused the sale of some of its watches and began selling modified versions of the wearables without the blood oxygen feature.
“Apple’s teams work tirelessly to create products and services that empower users with industry-leading health, wellness, and safety features that are grounded in science and have privacy at the core,” the company said in a release announcing the feature rollout.
Bitcoin hit a new record late Wednesday as ether climbed even closer to its all-time high.
The flagship cryptocurrency rose as high as $124,496, surpassing its July record of 123,193.63, according to Coin Metrics. Ether rose to $4,791.19 overnight, edging closer to its 2021 record of $4,866.01.
Both coins took a hit Thursday, however, after July’s wholesale inflation data came in much hotter than expected. Bitcoin was lower by 3% at $118,481.00 while ether fell 2% to $4,629.20.
Stock Chart IconStock chart icon
Bitcoin hit a new record overnight, surpassing its July all-time high
The initial gains were sparked by Tuesday’s cooler-than-expected July inflation report, which had lifted investor optimism for rate cuts from the Federal Reserve at the end of its September policy meeting. The coins rallied with the stock market for two days. On Wednesday, the S&P 500 and Nasdaq also scaled new records.
For the week, bitcoin is on pace for a nearly 2% gain, while ether has rallied more than 14%. Ether flipped bitcoin as the crypto market leader in June, gaining 85% since then thanks to heavy institutional buying, tightening supply and adoption from corporate accumulators – all under the backdrop of a friendlier regulatory environment for the crypto industry. Jake Kennis, analyst at Nansen, said the rally likely has more room to run given the flows remain strong.
“Bitcoin hitting a fresh all time high and ETH being on the verge of doing so means we’ve moved from speculative mania to a phase where institutional adoption, real-world integration, and global liquidity are driving price discovery,” said Ben Kurland, CEO at crypto research and trading platform DYOR.
“The fact that both assets are on the verge of breaking records in tandem signals broad market conviction, not just a single-asset rally,” he added. “Momentum this strong rarely burns out instantly, but it also tends to draw in latecomers who can fuel volatility. Right now the story is less about euphoria and more about validation. Crypto is graduating from ‘alternative’ to ‘essential’ in the global portfolio mix.”
Don’t miss these cryptocurrency insights from CNBC Pro:
Foxconn Hon Hai Technology Group signage during the Nvidia GPU Technology Conference (GTC) in San Jose, California, US, on Thursday, March 20, 2025.
David Paul Morris | Bloomberg | Getty Images
Taiwan’s Foxconn, the world’s largest contract electronics maker, reported Thursday that its second-quarter operating profit rose 27% year over year, on the strength of its growing artificial intelligence server business.
Here’s how Foxconn did in the second quarter of 2025 compared with LSEG SmartEstimates, which are weighted toward forecasts from analysts who are more consistently accurate:
Revenue: 1.79 trillion New Taiwan dollars ($59.73 billion) vs. NT$1.79 trillion
Operating profit: NT$56.596 billion vs. NT$49.767 billion
Second quarter revenue grew 16% from last year, coming in line with LSEG’s SmartEstimates. The company’s net profit for the second quarter came in at NT$44.36 billion, beating expectations of NT$38.81 billion.
Foxconn, formally called Hon Hai Precision Industry, is the world’s largest manufacturer of Apple’s iPhones, and has been looking to replicate its success in consumer electronics in the world of AI.
The firm manufactures server racks designed for AI workloads and has become a key partner to American AI chip darling Nvidia.
Sales of Foxconn’s server products made up the lion’s share of revenues in the second quarter at 41%, surpassing its smart consumer electronic products for the first time, which accounted for 35%.
In an earnings report, the company forecasted that its AI server business would continue to drive growth into the current quarter, with revenue expected to increase by over 170% year over year.
Foxconn said earlier this month that it expected overall revenue to grow further in the third quarter, but noted that the impact of “evolving global political and economic conditions” would be closely monitored.
At the end of July, Foxconn announced that it was taking a stake in industrial motor maker TECO Electric & Machinery in a strategic partnership to build more AI data centers.
The company has also shown its willingness to expand into new areas, including the assembly of electric vehicles and the manufacturing of semiconductors.
However, U.S. President Donald Trump’s global tariffs could impact Foxconn’s outlook this year. In response to Trump’s tariff threats, the company has already moved most of its final production of made-for-the-U.S. iPhones to India.
Taiwan has been hit with a 20% “temporary tariff” from the U.S., with trade negotiations said to be ongoing.
Last week, Trump also said he would impose a 100% tariff on imports of semiconductors and chips, but not on companies that are “building in the United States.”
While the details of these tariffs remain unclear, Foxconn Technology Co, a metal casing supplier owned by Hon Hai Precision Industry, announced plans to invest $1 billion in the U.S. over the next ten years as part of its North American expansion strategy, according to local media reports.