Connect with us

Published

on

OpenAI CEO Sam Altman speaks during a keynote address announcing ChatGPT integration for Bing at Microsoft in Redmond, Washington, on February 7, 2023.

Jason Redmond | AFP | Getty Images

Before OpenAI’s ChatGPT emerged and captured the world’s attention for its ability to create compelling sentences, a small startup called Latitude was wowing consumers with its AI Dungeon game that let them use artifical intelligence to create fantastical tales based on their prompts.

But as AI Dungeon became more popular, Latitude CEO Nick Walton recalled that the cost to maintain the text-based role-playing game began to skyrocket. Powering AI Dungeon’s text-generation software was the GPT language technology offered by the Microsoft-backed artificial intelligence research lab OpenAI. The more people played AI Dungeon, the bigger the bill Latitude had to pay OpenAI.

Compounding the predicament was that Walton also discovered content marketers were using AI Dungeon to generate promotional copy, a use for AI Dungeon that his team never foresaw, but that ended up adding to the company’s AI bill.

At its peak in 2021, Walton estimates Latitude was spending nearly $200,000 a month on OpenAI’s so-called generative AI software and Amazon Web Services in order to keep up with the millions of user queries it needed to process each day.

“We joked that we had human employees and we had AI employees, and we spent about as much on each of them,” Walton said. “We spent hundreds of thousands of dollars a month on AI and we are not a big startup, so it was a very massive cost.”

By the end of 2021, Latitude switched from using OpenAI’s GPT software to a cheaper but still capable language software offered by startup AI21 Labs, Walton said, adding that the startup also incorporated open source and free language models into its service to lower the cost. Latitude’s generative AI bills have dropped to under $100,000 a month, Walton said, and the startup charges players a monthly subscription for more advanced AI features to help reduce the cost.

Latitude’s pricey AI bills underscore an unpleasant truth behind the recent boom in generative AI technologies: The cost to develop and maintain the software can be extraordinarily high, both for the firms that develop the underlying technologies, generally referred to as a large language or foundation models, and those that use the AI to power their own software.

The high cost of machine learning is an uncomfortable reality in the industry as venture capitalists eye companies that could potentially be worth trillions, and big companies such as Microsoft, Meta, and Google use their considerable capital to develop a lead in the technology that smaller challengers can’t catch up to. 

But if the margin for AI applications is permanently smaller than previous software-as-a-service margins, because of the high cost of computing, it could put a damper on the current boom. 

The high cost of training and “inference” — actually running — large language models is a structural cost that differs from previous computing booms. Even when the software is built, or trained, it still requires a huge amount of computing power to run large language models because they do billions of calculations every time they return a response to a prompt. By comparison, serving web apps or pages requires much less calculation.

These calculations also require specialized hardware. While traditional computer processors can run machine learning models, they’re slow. Most training and inference now takes place on graphics processors, or GPUs, which were initially intended for 3D gaming, but have become the standard for AI applications because they can do many simple calculations simultaneously. 

Nvidia makes most of the GPUs for the AI industry, and its primary data center workhorse chip costs $10,000. Scientists that build these models often joke that they “melt GPUs.”

Training models

Nvidia A100 processor

Nvidia

Analysts and technologists estimate that the critical process of training a large language model such as GPT-3 could cost more than $4 million. More advanced language models could cost over “the high-single digit-millions” to train, said Rowan Curran, a Forrester analyst who focuses on AI and machine learning.

Meta’s largest LLaMA model released last month, for example, used 2,048 Nvidia A100 GPUs to train on 1.4 trillion tokens (750 words is about 1,000 tokens), taking about 21 days, the company said when it released the model last month. 

It took about 1 million GPU hours to train. With dedicated prices from AWS, that would cost over $2.4 million. And at 65 billion parameters, it’s smaller than the current GPT models at OpenAI, like ChatGPT-3, which has 175 billion parameters. 

Clement Delangue, the CEO of AI startup Hugging Face, said the process of training the company’s Bloom large language model took more than two-and-a-half months and required access to a supercomputer that was “something like the equivalent of 500 GPUs.”

Organizations that build large language models must be cautious when they retrain the software, which helps the software improve its abilities, because it costs so much, he said.

“It’s important to realize that these models are not trained all the time, like every day,” Delangue said, noting that’s why some models, like ChatGPT, don’t have knowledge of recent events. ChatGPT’s knowledge stops in 2021, he said.

“We are actually doing a training right now for the version two of Bloom and it’s gonna cost no more than $10 million to retrain,” Delangue said. “So that’s the kind of thing that we don’t want to do every week.”

Inference and who pays for it

Bing with Chat

Jordan Novet | CNBC

To use a trained machine learning model to make predictions or generate text, engineers use the model in a process called “inference,” which can be much more expensive than training because it might need to run millions of times for a popular product.

For a product as popular as ChatGPT — which investment firm UBS estimates to have reached 100 million monthly active users in January — Curran believes that it could have cost OpenAI $40 million to process the millions of prompts people fed into the software that month.

Costs skyrocket when these tools are used billions of times a day. Financial analysts estimate Microsoft’s Bing AI chatbot, which is powered by an OpenAI ChatGPT model, needs at least $4 billion of infrastructure to serve responses to all Bing users.

In the case of Latitude, for instance, while the startup didn’t have to pay to train the underlying OpenAI language model it was accessing, it had to account for the inferencing costs that were something akin to “half-a-cent per call” on “a couple million requests per day,” a Latitude spokesperson said.

“And I was being relatively conservative,” Curran said of his calculations.

In order to sow the seeds of the current AI boom, venture capitalists and tech giants have been investing billions of dollars into startups that specialize in generative AI technologies. Microsoft, for instance, invested as much as $10 billion into GPT’s overseer OpenAI, according to media reports in January. Salesforce‘s venture capital arm, Salesforce Ventures, recently debuted a $250 million fund that caters to generative AI startups.

As investor Semil Shah of the VC firms Haystack and Lightspeed Venture Partners described on Twitter, “VC dollars shifted from subsidizing your taxi ride and burrito delivery to LLMs and generative AI compute.”

Many entrepreneurs see risks in relying on potentially subsidized AI models that they don’t control and merely pay for on a per-use basis.

“When I talk to my AI friends at the startup conferences, this is what I tell them: Do not solely depend on OpenAI, ChatGPT or any other large language models,” said Suman Kanuganti, founder of personal.ai, a chatbot currently in beta mode. “Because businesses shift, they are all owned by big tech companies, right? If they cut access, you’re gone.”

Companies such as enterprise tech firm Conversica are exploring how they can use the tech through Microsoft’s Azure cloud service at its currently discounted price.

While Conversica CEO Jim Kaskade declined to comment about how much the startup is paying, he conceded that the subsidized cost is welcome as it explores how language models can be used effectively.

“If they were truly trying to break even, they’d be charging a hell of a lot more,” Kaskade said.

How it could change

Nvidia expanded from gaming into A.I. Now the big bet is paying off as its chips power ChatGPT

It’s unclear if AI computation will stay expensive as the industry develops. Companies making the foundation models, semiconductor makers and startups all see business opportunities in reducing the price of running AI software.

Nvidia, which has about 95% of the market for AI chips, continues to develop more powerful versions designed specifically for machine learning, but improvements in total chip power across the industry have slowed in recent years.

Still, Nvidia CEO Jensen Huang believes that in 10 years, AI will be “a million times” more efficient because of improvements not only in chips, but also in software and other computer parts.

“Moore’s Law, in its best days, would have delivered 100x in a decade,” Huang said last month on an earnings call. “By coming up with new processors, new systems, new interconnects, new frameworks and algorithms, and working with data scientists, AI researchers on new models, across that entire span, we’ve made large language model processing a million times faster.”

Some startups have focused on the high cost of AI as a business opportunity.

“Nobody was saying ‘You should build something that was purpose-built for inference.’ What would that look like?” said Sid Sheth, founder of D-Matrix, a startup building a system to save money on inference by doing more processing in the computer’s memory, as opposed to on a GPU.

“People are using GPUs today, NVIDIA GPUs, to do most of their inference. They buy the DGX systems that NVIDIA sells that cost a ton of money. The problem with inference is if the workload spikes very rapidly, which is what happened to ChatGPT, it went to like a million users in five days. There is no way your GPU capacity can keep up with that because it was not built for that. It was built for training, for graphics acceleration,” he said.

Delangue, the HuggingFace CEO, believes more companies would be better served focusing on smaller, specific models that are cheaper to train and run, instead of the large language models that are garnering most of the attention.

Meanwhile, OpenAI announced last month that it’s lowering the cost for companies to access its GPT models. It now charges one-fifth of one cent for about 750 words of output.

OpenAI’s lower prices have caught the attention of AI Dungeon-maker Latitude.

“I think it’s fair to say that it’s definitely a huge change we’re excited to see happen in the industry and we’re constantly evaluating how we can deliver the best experience to users,” a Latitude spokesperson said. “Latitude is going to continue to evaluate all AI models to be sure we have the best game out there.”

Watch: AI’s “iPhone Moment” – Separating ChatGPT Hype and Reality

AI's "iPhone Moment" – Separating ChatGPT Hype and Reality

Continue Reading

Technology

Alibaba launches new Qwen LLMs in China’s latest open-source AI breakthrough

Published

on

By

Alibaba launches new Qwen LLMs in China’s latest open-source AI breakthrough

Qwen3 is Alibaba’s debut into so-called “hybrid reasoning models,” which it says combines traditional LLM capabilities with “advanced, dynamic reasoning.”

Sopa Images | Lightrocket | Getty Images

Alibaba released the next generation of its open-sourced large language models, Qwen3, on Tuesday — and experts are calling it yet another breakthrough in China’s booming open-source artificial intelligence space.

In a blog post, the Chinese tech giant said Qwen3 promises improvements in reasoning, instruction following, tool usage and multilingual tasks, rivaling other top-tier models such as DeepSeek’s R1 in several industry benchmarks. 

The LLM series includes eight variations that span a range of architectures and sizes, offering developers flexibility when using Qwen to build AI applications for edge devices like mobile phones.

Qwen3 is also Alibaba’s debut into so-called “hybrid reasoning models,” which it says combines traditional LLM capabilities with “advanced, dynamic reasoning.”

According to Alibaba, such models can seamlessly transition between a “thinking mode” for complex tasks such as coding and a “non-thinking mode” for faster, general-purpose responses. 

“Notably, the Qwen3-235B-A22B MoE model significantly lowers deployment costs compared to other state-of-the-art models, reinforcing Alibaba’s commitment to accessible, high-performance AI,” Alibaba said. 

The new models are already freely available for individual users on platforms like Hugging Face and GitHub, as well as Alibaba Cloud’s web interface. Qwen3 is also being used to power Alibaba’s AI assistant, Quark.

China’s AI advancement

AI analysts told CNBC that the Qwen3 represents a serious challenge to Alibaba’s counterparts in China, as well as industry leaders in the U.S.  

In a statement to CNBC, Wei Sun, principal analyst of artificial intelligence at Counterpoint Research, said the Qwen3 series is a “significant breakthrough—not just for its best-in-class performance” but also for several features that point to the “application potential of the models.” 

Those features include Qwen3’s hybrid thinking mode, its multilingual support covering 119 languages and dialects and its open-source availability, Sun added.

Open-source software generally refers to software in which the source code is made freely available on the web for possible modification and redistribution. At the start of this year, DeepSeek’s open-sourced R1 model rocked the AI world and quickly became a catalyst for China’s AI space and open-source model adoption.  

“Alibaba’s release of the Qwen 3 series further underscores the strong capabilities of Chinese labs to develop highly competitive, innovative, and open-source models, despite mounting pressure from tightened U.S. export controls,” said Ray Wang, a Washington-based analyst focusing on U.S.-China economic and technology competition.

According to Alibaba, Qwen has already become one of the world’s most widely adopted open-source AI model series, attracting over 300 million downloads worldwide and more than 100,000 derivative models on Hugging Face. 

Wang said that this adoption could continue with Qwen3, adding that its performance claims may make it the best open-source model globally — though still behind the world’s most cutting-edge models like OpenAI’s o3 and o4-mini.  

Chinese competitors like Baidu have also rushed to release new AI models after the emergence of DeepSeek, including making plans to shift toward a more open-source business model. 

Meanwhile, Reuters reported in February that DeepSeek is accelerating the launch of its successor to its R1, citing anonymous sources.

“In the broader context of the U.S.-China AI race, the gap between American and Chinese labs has narrowed—likely to a few months, and some might argue, even to just weeks,” Wang said. 

“With the latest release of Qwen 3 and the upcoming launch of DeepSeek’s R2, this gap is unlikely to widen—and may even continue to shrink.”

Continue Reading

Technology

Uber raises in-office requirement to 3 days, claws back remote workers

Published

on

By

Uber raises in-office requirement to 3 days, claws back remote workers

Uber on Monday informed employees, including some who had been previously approved for remote work, that it will require them to come to the office three days a week, CNBC has learned. 

“Even as the external environment remains dynamic, we’re on solid footing, with a clear strategy and big plans,” CEO Dara Khosrowshahi told employees in the memo, which was viewed by CNBC. “As we head into this next chapter, I want to emphasize that ‘good’ is not going to be good enough — we need to be great.”

Khosrowshahi goes on to say employees need to push themselves so the company “can move faster and take smarter risks” and outlined several changes to Uber’s work policy.

Uber in 2022 established Tuesdays and Thursdays as “anchor days” where most employees must spend at least half of their work time in the company’s office. Starting in June, employees will be required in the office Tuesday through Thursday, according to the memo.

That includes some employees who were previously approved to work remotely. The company said it had already informed impacted remote employees.

“After a thorough review of our existing remote approvals, we’re asking many remote employees to come into an office,” Khosrowshahi wrote. “In addition, we’ll hire new remote roles only very sparingly.”

The company also changed its one-month paid sabbatical program, according to the memo. Previously, employees were eligible for the sabbatical after five years at the company. That’s now been raised to eight years, according to the memo. 

“This program was created when Uber was a much younger company, and when reaching 5 years of tenure was a rare feat,” Khosrowshahi wrote. “Back then, we were in the office five (sometimes more!) days of a week and hadn’t instituted our Work from Anywhere benefit.”

Khosrowshahi said the changes will help Uber move faster. 

“Our collective view as a leadership team is that while remote work has some benefits, being in the office fuels collaboration, sparks creativity, and increases velocity,” Khosrowshahi wrote.

The changes come as more companies in the tech industry cut costs to appease investors after over-hiring during the Covid-19 pandemic. Google recently began demanding that employees who were previously-approved for remote work also return to the office if they want to keep their jobs, CNBC reported last week.  

Last year, Khosrowshahi blamed remote work for the loss of its most loyal customers, who would take ride-sharing as their commute to work. 

“Going forward, we’re further raising this bar,” Khosrowshahi’s Monday memo said. “After a thorough review of our existing remote approvals, we’re asking many remote employees to come into an office. In addition, we’ll hire new remote roles only very sparingly.”

Uber’s leadership team will monitor attendance “at both team and individual levels to ensure expectations are being met,” Khosrowshahi wrote. 

Following the memo, Uber employees immediately swarmed the company’s internal question-and-answer forum, according to correspondence viewed by CNBC. Khosrowshahi said he and Nikki Krishnamurthy, the company’s chief people officer, will hold an all-hands meeting on Tuesday to discuss the changes.

Many employees asked leadership to reconsider the sabbatical change, arguing that the company should honor the original eligibility policy.

“This isn’t ‘doing the right thing’ for your employees,” one employee commented.

Uber did not immediately respond to a request for comment.

WATCH: Lightning Round: Uber goes higher from here, says Jim Cramer

Continue Reading

Technology

Amazon launches first Kuiper internet satellites in bid to take on Elon Musk’s Starlink

Published

on

By

Amazon launches first Kuiper internet satellites in bid to take on Elon Musk's Starlink

A United Launch Alliance Atlas V rocket is on the launch pad carrying Amazon’s Project Kuiper internet network satellites, which are expected to eventually rival Elon Musk’s Starlink system, at the Cape Canaveral Space Force Station in Cape Canaveral, Florida, U.S., April 9, 2025. 

Steve Nesius | Reuters

Amazon on Monday launched the first batch of its Kuiper internet satellites into space after an earlier attempt was scrubbed due to inclement weather.

A United Launch Alliance rocket carrying 27 Kuiper satellites lifted off from a launchpad at the Cape Canaveral Space Force Station in Florida shortly after 7 p.m. eastern, according to a livestream.

“We had a nice smooth countdown, beautiful weather, beautiful liftoff, and Atlas V is on its way to orbit to take those 27 Kuiper satellites, put them on their way and really start this new era in internet connectivity,” Caleb Weiss, a systems engineer at ULA, said on the livestream following the launch.

The satellites are expected to separate from the rocket roughly 280 miles above Earth’s surface, at which point Amazon will look to confirm the satellites can independently maneuver and communicate with its employees on the ground.

Six years ago Amazon unveiled its plans to build a constellation of internet-beaming satellites in low Earth orbit, called Project Kuiper. The service will compete directly with Elon Musk’s Starlink, which currently dominates the market and has 8,000 satellites in orbit.

The first Kuiper mission kicks off what will need to become a steady cadence of launches in order for Amazon to meet a deadline set by the Federal Communications Commission. The agency expects the company to have half of its total constellation, or 1,618 satellites, up in the air by July 2026.

Amazon has booked more than 80 launches to deploy dozens of satellites at a time. In addition to ULA, its launch partners include Musk’s SpaceX (parent company of Starlink), European company Arianespace and Jeff Bezos’ space exploration startup Blue Origin.

Amazon is spending as much as $10 billion to build the Kuiper network. It hopes to begin commercial service for consumers, enterprises and government later this year.

In his shareholder letter earlier this month, Amazon CEO Andy Jassy said Kuiper will require upfront investment at first, but eventually the company expects it to be “a meaningful operating income and ROIC business for us.” ROIC stands for return on invested capital.

Investors will be listening for any commentary around further capex spend on Kuiper when Amazon reports first-quarter earnings after the bell on Thursday.

WATCH: Amazon launches Project Kuiper prototypes

Amazon launches Project Kuiper prototypes to low orbit as tech giant enters satellite internet race

Continue Reading

Trending