OpenAI CEO Sam Altman speaks during a keynote address announcing ChatGPT integration for Bing at Microsoft in Redmond, Washington, on February 7, 2023.
Jason Redmond | AFP | Getty Images
Before OpenAI’s ChatGPT emerged and captured the world’s attention for its ability to create compelling sentences, a small startup called Latitude was wowing consumers with its AI Dungeon game that let them use artifical intelligence to create fantastical tales based on their prompts.
But as AI Dungeon became more popular, Latitude CEO Nick Walton recalled that the cost to maintain the text-based role-playing game began to skyrocket. Powering AI Dungeon’s text-generation software was the GPT language technology offered by the Microsoft-backed artificial intelligence research lab OpenAI. The more people played AI Dungeon, the bigger the bill Latitude had to pay OpenAI.
Compounding the predicament was that Walton also discovered content marketers were using AI Dungeon to generate promotional copy, a use for AI Dungeon that his team never foresaw, but that ended up adding to the company’s AI bill.
At its peak in 2021, Walton estimates Latitude was spending nearly $200,000 a month on OpenAI’s so-called generative AI software and Amazon Web Services in order to keep up with the millions of user queries it needed to process each day.
“We joked that we had human employees and we had AI employees, and we spent about as much on each of them,” Walton said. “We spent hundreds of thousands of dollars a month on AI and we are not a big startup, so it was a very massive cost.”
By the end of 2021, Latitude switched from using OpenAI’s GPT software to a cheaper but still capable language software offered by startup AI21 Labs, Walton said, adding that the startup also incorporated open source and free language models into its service to lower the cost. Latitude’s generative AI bills have dropped to under $100,000 a month, Walton said, and the startup charges players a monthly subscription for more advanced AI features to help reduce the cost.
Latitude’s pricey AI bills underscore an unpleasant truth behind the recent boom in generative AI technologies: The cost to develop and maintain the software can be extraordinarily high, both for the firms that develop the underlying technologies, generally referred to as a large language or foundation models, and those that use the AI to power their own software.
The high cost of machine learning is an uncomfortable reality in the industry as venture capitalists eye companies that could potentially be worth trillions, and big companies such as Microsoft, Meta, and Google use their considerable capital to develop a lead in the technology that smaller challengers can’t catch up to.
But if the margin for AI applications is permanently smaller than previous software-as-a-service margins, because of the high cost of computing, it could put a damper on the current boom.
The high cost of training and “inference” — actually running — large language models is a structural cost that differs from previous computing booms. Even when the software is built, or trained, it still requires a huge amount of computing power to run large language models because they do billions of calculations every time they return a response to a prompt. By comparison, serving web apps or pages requires much less calculation.
These calculations also require specialized hardware. While traditional computer processors can run machine learning models, they’re slow. Most training and inference now takes place on graphics processors, or GPUs, which were initially intended for 3D gaming, but have become the standard for AI applications because they can do many simple calculations simultaneously.
Nvidia makes most of the GPUs for the AI industry, and its primary data center workhorse chip costs $10,000. Scientists that build these models often joke that they “melt GPUs.”
Training models
Nvidia A100 processor
Nvidia
Analysts and technologists estimate that the critical process of training a large language model such as GPT-3 could cost more than $4 million. More advanced language models could cost over “the high-single digit-millions” to train, said Rowan Curran, a Forrester analyst who focuses on AI and machine learning.
Meta’s largest LLaMA model released last month, for example, used 2,048 Nvidia A100 GPUs to train on 1.4 trillion tokens (750 words is about 1,000 tokens), taking about 21 days, the company said when it released the model last month.
It took about 1 million GPU hours to train. With dedicated prices from AWS, that would cost over $2.4 million. And at 65 billion parameters, it’s smaller than the current GPT models at OpenAI, like ChatGPT-3, which has 175 billion parameters.
Clement Delangue, the CEO of AI startup Hugging Face, said the process of training the company’s Bloom large language model took more than two-and-a-half months and required access to a supercomputer that was “something like the equivalent of 500 GPUs.”
Organizations that build large language models must be cautious when they retrain the software, which helps the software improve its abilities, because it costs so much, he said.
“It’s important to realize that these models are not trained all the time, like every day,” Delangue said, noting that’s why some models, like ChatGPT, don’t have knowledge of recent events. ChatGPT’s knowledge stops in 2021, he said.
“We are actually doing a training right now for the version two of Bloom and it’s gonna cost no more than $10 million to retrain,” Delangue said. “So that’s the kind of thing that we don’t want to do every week.”
Inference and who pays for it
Bing with Chat
Jordan Novet | CNBC
To use a trained machine learning model to make predictions or generate text, engineers use the model in a process called “inference,” which can be much more expensive than training because it might need to run millions of times for a popular product.
For a product as popular as ChatGPT — which investment firm UBS estimates to have reached 100 million monthly active users in January — Curran believes that it could have cost OpenAI $40 million to process the millions of prompts people fed into the software that month.
Costs skyrocket when these tools are used billions of times a day. Financial analysts estimate Microsoft’s Bing AI chatbot, which is powered by an OpenAI ChatGPT model, needs at least $4 billion of infrastructure to serve responses to all Bing users.
In the case of Latitude, for instance, while the startup didn’t have to pay to train the underlying OpenAI language model it was accessing, it had to account for the inferencing costs that were something akin to “half-a-cent per call” on “a couple million requests per day,” a Latitude spokesperson said.
“And I was being relatively conservative,” Curran said of his calculations.
In order to sow the seeds of the current AI boom, venture capitalists and tech giants have been investing billions of dollars into startups that specialize in generative AI technologies. Microsoft, for instance, invested as much as $10 billion into GPT’s overseer OpenAI, according to media reports in January. Salesforce‘s venture capital arm, Salesforce Ventures, recently debuted a $250 million fund that caters to generative AI startups.
As investor Semil Shah of the VC firms Haystack and Lightspeed Venture Partners described on Twitter, “VC dollars shifted from subsidizing your taxi ride and burrito delivery to LLMs and generative AI compute.”
Many entrepreneurs see risks in relying on potentially subsidized AI models that they don’t control and merely pay for on a per-use basis.
“When I talk to my AI friends at the startup conferences, this is what I tell them: Do not solely depend on OpenAI, ChatGPT or any other large language models,” said Suman Kanuganti, founder of personal.ai, a chatbot currently in beta mode. “Because businesses shift, they are all owned by big tech companies, right? If they cut access, you’re gone.”
Companies such as enterprise tech firm Conversica are exploring how they can use the tech through Microsoft’s Azure cloud service at its currently discounted price.
While Conversica CEO Jim Kaskade declined to comment about how much the startup is paying, he conceded that the subsidized cost is welcome as it explores how language models can be used effectively.
“If they were truly trying to break even, they’d be charging a hell of a lot more,” Kaskade said.
How it could change
It’s unclear if AI computation will stay expensive as the industry develops. Companies making the foundation models, semiconductor makers and startups all see business opportunities in reducing the price of running AI software.
Nvidia, which has about 95% of the market for AI chips, continues to develop more powerful versions designed specifically for machine learning, but improvements in total chip power across the industry have slowed in recent years.
Still, Nvidia CEO Jensen Huang believes that in 10 years, AI will be “a million times” more efficient because of improvements not only in chips, but also in software and other computer parts.
“Moore’s Law, in its best days, would have delivered 100x in a decade,” Huang said last month on an earnings call. “By coming up with new processors, new systems, new interconnects, new frameworks and algorithms, and working with data scientists, AI researchers on new models, across that entire span, we’ve made large language model processing a million times faster.”
Some startups have focused on the high cost of AI as a business opportunity.
“Nobody was saying ‘You should build something that was purpose-built for inference.’ What would that look like?” said Sid Sheth, founder of D-Matrix, a startup building a system to save money on inference by doing more processing in the computer’s memory, as opposed to on a GPU.
“People are using GPUs today, NVIDIA GPUs, to do most of their inference. They buy the DGX systems that NVIDIA sells that cost a ton of money. The problem with inference is if the workload spikes very rapidly, which is what happened to ChatGPT, it went to like a million users in five days. There is no way your GPU capacity can keep up with that because it was not built for that. It was built for training, for graphics acceleration,” he said.
Delangue, the HuggingFace CEO, believes more companies would be better served focusing on smaller, specific models that are cheaper to train and run, instead of the large language models that are garnering most of the attention.
Meanwhile, OpenAI announced last month that it’s lowering the cost for companies to access its GPT models. It now charges one-fifth of one cent for about 750 words of output.
OpenAI’s lower prices have caught the attention of AI Dungeon-maker Latitude.
“I think it’s fair to say that it’s definitely a huge change we’re excited to see happen in the industry and we’re constantly evaluating how we can deliver the best experience to users,” a Latitude spokesperson said. “Latitude is going to continue to evaluate all AI models to be sure we have the best game out there.”
Tik Tok creators gather before a press conference to voice their opposition to the “Protecting Americans from Foreign Adversary Controlled Applications Act,” pending crackdown legislation on TikTok in the House of Representatives, on Capitol Hill in Washington, U.S., March 12, 2024.
Craig Hudson | Reuters
The Supreme Court on Friday will hear oral arguments in the case involving the future of TikTok in the U.S., which could ban the popular app as soon as next week.
The justices will consider whether the Protecting Americans from Foreign Adversary Controlled Applications Act, the law that targets TikTok’s ban and imposes harsh civil penalties for app “entities” that continue to carry the service after Jan.19, violates the U.S. Constitution’s free speech protections.
It’s unclear when the court will hand down a decision, and if China’s ByteDance continues to refuse to divest TikTok to an American company, it faces a complete ban nationwide.
What will change about the user experience?
The roughly 115 million U.S. TikTok monthly active users could face a range of scenarios depending on when the Supreme Court hands down a decision.
If no word comes before the law takes effect on Jan. 19 and the ban goes through, it’s possible that users would still be able to post or engage with the app if they already have it downloaded. However, those users would likely be unable to update or redownload the app after that date, multiple legal experts said.
Thousands of short-form video creators who generate income from TikTok through ad revenue, paid partnerships, merchandise and more will likely need to transition their businesses to other platforms, like YouTube or Instagram.
“Shutting down TikTok, even for a single day, would be a big deal, not just for people who create content on TikTok, but everyone who shares or views content,” said George Wang, a staff attorney at the Knight First Amendment Institute who helped write the institute’s amicus briefs on the case.
“It sets a really dangerous precedent for how we regulate speech online,” Wang said.
Who supports and opposes the ban?
Dozens of high-profile amicus briefs from organizations, members of Congress and President-elect Donald Trump were filed supporting both the government and ByteDance.
The government, led by Attorney General Merrick Garland, alleges that until ByteDance divests TikTok, the app remains a “powerful tool for espionage” and a “potent weapon for covert influence operations.”
Trump’s brief did not voice support for either side, but it did ask the court to oppose banning the platform and allow him to find a political resolution that allows the service to continue while addressing national security concerns.
The short-form video app played a notable role in both Trump and Democratic nominee Kamala Harris’ presidential campaigns in 2024, and it’s one of the most common news sources for younger voters.
In a September Truth Social post, Trump wrote in all caps Americans who want to save TikTok should vote for him. The post was quoted in his amicus brief.
What comes next?
It’s unclear when the Supreme Court will issue its ruling, but the case’s expedited hearing has some predicting that the court could issue a quick ruling.
The case will have “enormous implications” since TikTok’s user base in the U.S. is so large, said Erwin Chemerinsky, dean of Berkeley Law.
“It’s unprecedented for the government to prohibit platforms for speech, especially one so many people use,” Chemerinsky said. “Ultimately, this is a tension between free speech issues on the one hand and claims of national security on the other.”
Nvidia CEO Jensen Huang speaks about Project Digits personal AI supercomputer for researchers and students during a keynote address at the Consumer Electronics Show (CES) in Las Vegas, Nevada on January 6, 2025. Gadgets, robots and vehicles imbued with artificial intelligence will once again vie for attention at the Consumer Electronics Show, as vendors behind the scenes will seek ways to deal with tariffs threatened by US President-elect Donald Trump. The annual Consumer Electronics Show (CES) opens formally in Las Vegas on January 7, 2025, but preceding days are packed with product announcements. (Photo by Patrick T. Fallon / AFP) (Photo by PATRICK T. FALLON/AFP via Getty Images)
Patrick T. Fallon | Afp | Getty Images
Nvidia CEO Jensen Huang was greeted as a rock star this week CES in Las Vegas, following an artificial intelligence boom that’s made the chipmaker the second most-valuable company in the world.
At his nearly two-hour keynote on Monday kicking off the annual conference, Huang packed a 12,000-seat arena, drawing comparisons to the way Steve Jobs would reveal products at Apple events.
Huang concluded with an Apple-like trick: a surprise product reveal. He presented one of Nvidia’s server racks and, using some stage magic, held up a much smaller version, which looked like a tiny cube of a computer.
“This is an AI supercomputer,” Huang said, while donning an alligator skin leather jacket. “It runs the entire Nvidia AI stack. All of Nvidia’s software runs on this.”
Huang said the computer is called Project Digits and runs off a relative of the Grace Blackwell graphics processing units (GPUs) that are currently powering the most advanced AI server clusters. The GPU is paired with an ARM-based Grace central processing unit (CPU). Nvidia worked with Chinese semiconductor company MediaTek to create the system-on-a chip called GB10.
Formerly known as the Consumer Electronics Show, CES is typically the spot to launch flashy and futuristic consumer gadgets. At this year’s show, which started on Tuesday and wraps up on Friday, several companies announced AI integrations with appliances, laptops and even grills. Other major announcements included a laptop from Lenovo which has a rollable screen that can expand vertically. There were also new robots, including a Roomba competitor with a robotic arm.
Unlike Nvidia’s traditional GPUs for gaming, Project Digits isn’t targeting consumers. instead, it’s aimed at machine learning researchers, smaller companies, and universities that want to developed advanced AI but don’t have the billions of dollars to build massive data centers or buy enough cloud credits.
“There’s a gaping hole for data scientists and ML researchers and who are actively working, who are actively building something,” Huang said. “Maybe you don’t need a giant cluster. You’re just developing the early versions of the model, and you’re iterating constantly. You could do it in the cloud, but it just costs a lot more money.”
The supercomputer will cost about $3,000 when it becomes available in May, Nvidia said, and will be available from the company itself as well as some of its manufacturing partners. Huang said Project Digits is a placeholder name, indicating it may change by the time the computer goes on sale.
“If you have a good name for it, reach out to us,” Huang said.
Diversifying its business
It’s a dramatically different kind of product from the GPUs that have driven Nvidia’s historic boom in the past two years. OpenAI, which launched ChatGPT in late 2022, and other AI model creators like Anthropic have joined with large cloud providers in snapping up Nvidia’s data center GPUs because of their ability to power the most intensive models and computing workloads.
Data center sales accounted for 88% of Nvidia’s $35 billion in revenue in the most recent quarter.
Wall Street is focused on Nvidia’s ability to diversify its business so that it’s less reliant on a handful of customers buying massive AI systems.
The Nvidia Project Digits supercomputer during the 2025 CES event in Las Vegas, Nevada, US, on Wednesday, Jan. 8, 2025.
Bridget Bennett | Bloomberg | Getty Images
“It was a little scary to see Nvidia come out with something so good for so little in price,” Melius Research analyst Ben Reitzes wrote in a note this week. He said Nvidia may have “stolen the show,” due to Project Digits as well other announcements including graphics cards for gaming, new robot chips and a deal with Toyota.
Project Digits, which runs Linux and the same Nvidia software used on the company’s GPU server clusters, represents a huge increase in capabilities for researchers and universities, said David Bader, director of the Institute for Data Science at New Jersey Institute of Technology.
Bader, who has worked on research projects with Nvidia in the past, said the computer appears to be able to handle enough data and information to train the biggest and most cutting-edge models. He told CNBC Anthropic, Google, Amazon and others “would pay $100 million to build a super computer for training” to get a system with these sorts of capabilities.
For $3,000, users can soon get a product they can plug into a standard electrical outlet in their home or office, Bader said. It’s particularly exciting for academics, who have often left for private industry in order to access bigger and more powerful computers, he said.
“Any student who is able to have one of these systems that cost roughly the same as a high-end laptop or gaming laptop, they’ll be able to do the same research and build the same models,” Bader said.
Reitzes said the computer may be Nvidia’s first move into the $50 billion market for PC and laptop chips.
“It’s not too hard to imagine it would be easy to just do it all themselves and allow the system to run Windows someday,” Reitzes wrote. “But I guess they don’t want to step on too many toes.”
Huang didn’t rule out that possibility when asked about it by Wall Street analysts on Tuesday.
He said that MediaTek may be able to sell the GB10 chip to other computer makers in the market. He made sure to leave some mystery in the air.
Alice Weidel, co-leader of the far-right Alternative for Germany (AfD) political party, arrives to speak to the media with AfD co-leader Tino Chrupalla shortly after the AfD leadership confirmed Weidel as the party’s candidate for chancellor on December 07, 2024 in Berlin, Germany.
Maryam Majd | Getty Images
Elon Musk used his social network X to promote Germany’s far-right Alternative for Germany party, known as AfD, hosting a live discussion Thursday with party leader Alice Weidel, a candidate for chancellor, ahead of a general election on Feb. 23.
“I’m really strongly recommending that people vote for AfD,” Musk, who is CEO of Tesla and SpaceX in addition to his role at X, said about a half hour into the conversation. “That’s my strong recommendation.”
The AfD has been classified as a “suspected extremist organization” by German domestic intelligence services. The party’s platform calls for rigid asylum laws, mass deportations, cuts to social and welfare support in Germany, and the reversal of restrictions on combustion engine vehicles.
Thierry Breton, former European Union commissioner for the internal market, said in a Jan. 4 post on X directed at Weidel: “As a European citizen concerned with the proper use of systemic platforms authorized to operate in the EU … especially to protect our democratic rules against illegal or misbehavior during election times, I believe it’s crucial to remind you” that a live discussion on X would give AfD and Weidel “a significant and valuable advantage over your competitors.”
While AfD has amassed about 20% of public support, according to reporting from broadcaster DW, the party is unlikely to form part of a coalition government, as most other parties have vowed not to work with it.
AfD previously protested the build-out of Tesla’s electric vehicle factory outside Berlin, in part because the factory would provide jobs to people who were not German citizens.
Musk’s earlier endorsements of AfD, including tweets complimenting the party and an editorial in a German newspaper, have enraged European government officials. Musk, the wealthiest person in the world, has also endorsed far-right and anti-establishment candidates and causes in the U.K.
Political leaders in France, Germany, Norway and the U.K. denounced his influence, NBC News previously reported, warning that Musk should not involve himself in their countries’ elections.
Musk, who was one of President-elect Donald Trump’s top backers in November’s election, previously promoted Trump in a live-streamed discussion on X. Before that, he hosted a conversation with Florida Gov. Ron DeSantis, who lost to Trump in the Republican primary.
Weidel during Thursday’s talk asked Musk about what Trump might do to bring Russia’s war in Ukraine to a conclusion, as the president-elect has suggested he could quickly do.
Musk demurred.
“To be clear this is up to President Trump, he is commander and chief, so it’s really up to him,” Musk said. “I don’t want to speak for him but you know I do think that there is a path to a resolution but it does require strong leadership in the United States to get this done.”
Musk also weighed in on what he thought should be done in Gaza, which has been under attack from Israel since Hamas’ deadly incursion into Israel on Oct. 7, 2023.
“There’s no choice but to eliminate those who wish to eliminate the state of Israel, you know Hamas essentially,” Musk said. “Then, the second step is to fix the education so that Palestinians are not trained from when they are children to hate and want the death of Israel.”
“Then, the third thing, which is also very important, is to make the Palestinian areas prosperous.”