Connect with us

Published

on

In an unmarked office building in Austin, Texas, two small rooms contain a handful of Amazon employees designing two types of microchips for training and accelerating generative AI. These custom chips, Inferentia and Trainium, offer AWS customers an alternative to training their large language models on Nvidia GPUs, which have been getting difficult and expensive to procure. 

“The entire world would like more chips for doing generative AI, whether that’s GPUs or whether that’s Amazon’s own chips that we’re designing,” Amazon Web Services CEO Adam Selipsky told CNBC in an interview in June. “I think that we’re in a better position than anybody else on Earth to supply the capacity that our customers collectively are going to want.”

Yet others have acted faster, and invested more, to capture business from the generative AI boom. When OpenAI launched ChatGPT in November, Microsoft gained widespread attention for hosting the viral chatbot, and investing a reported $13 billion in OpenAI. It was quick to add the generative AI models to its own products, incorporating them into Bing in February. 

That same month, Google launched its own large language model, Bard, followed by a $300 million investment in OpenAI rival Anthropic. 

It wasn’t until April that Amazon announced its own family of large language models, called Titan, along with a service called Bedrock to help developers enhance software using generative AI.

“Amazon is not used to chasing markets. Amazon is used to creating markets. And I think for the first time in a long time, they are finding themselves on the back foot and they are working to play catch up,” said Chirag Dekate, VP analyst at Gartner.

Meta also recently released its own LLM, Llama 2. The open-source ChatGPT rival is now available for people to test on Microsoft‘s Azure public cloud.

Chips as ‘true differentiation’

In the long run, Dekate said, Amazon’s custom silicon could give it an edge in generative AI. 

“I think the true differentiation is the technical capabilities that they’re bringing to bear,” he said. “Because guess what? Microsoft does not have Trainium or Inferentia,” he said.

AWS quietly started production of custom silicon back in 2013 with a piece of specialized hardware called Nitro. It’s now the highest-volume AWS chip. Amazon told CNBC there is at least one in every AWS server, with a total of more than 20 million in use. 

AWS started production of custom silicon back in 2013 with this piece of specialized hardware called Nitro. Amazon told CNBC in August that Nitro is now the highest volume AWS chip, with at least one in every AWS server and a total of more than 20 million in use.

Courtesy Amazon

In 2015, Amazon bought Israeli chip startup Annapurna Labs. Then in 2018, Amazon launched its Arm-based server chip, Graviton, a rival to x86 CPUs from giants like AMD and Intel.

“Probably high single-digit to maybe 10% of total server sales are Arm, and a good chunk of those are going to be Amazon. So on the CPU side, they’ve done quite well,” said Stacy Rasgon, senior analyst at Bernstein Research.

Also in 2018, Amazon launched its AI-focused chips. That came two years after Google announced its first Tensor Processor Unit, or TPU. Microsoft has yet to announce the Athena AI chip it’s been working on, reportedly in partnership with AMD

CNBC got a behind-the-scenes tour of Amazon’s chip lab in Austin, Texas, where Trainium and Inferentia are developed and tested. VP of product Matt Wood explained what both chips are for.

“Machine learning breaks down into these two different stages. So you train the machine learning models and then you run inference against those trained models,” Wood said. “Trainium provides about 50% improvement in terms of price performance relative to any other way of training machine learning models on AWS.”

Trainium first came on the market in 2021, following the 2019 release of Inferentia, which is now on its second generation.

Inferentia allows customers “to deliver very, very low-cost, high-throughput, low-latency, machine learning inference, which is all the predictions of when you type in a prompt into your generative AI model, that’s where all that gets processed to give you the response, ” Wood said.

For now, however, Nvidia’s GPUs are still king when it comes to training models. In July, AWS launched new AI acceleration hardware powered by Nvidia H100s. 

“Nvidia chips have a massive software ecosystem that’s been built up around them over the last like 15 years that nobody else has,” Rasgon said. “The big winner from AI right now is Nvidia.”

Amazon’s custom chips, from left to right, Inferentia, Trainium and Graviton are shown at Amazon’s Seattle headquarters on July 13, 2023.

Joseph Huerta

Leveraging cloud dominance

AWS’ cloud dominance, however, is a big differentiator for Amazon.

“Amazon does not need to win headlines. Amazon already has a really strong cloud install base. All they need to do is to figure out how to enable their existing customers to expand into value creation motions using generative AI,” Dekate said.

When choosing between Amazon, Google, and Microsoft for generative AI, there are millions of AWS customers who may be drawn to Amazon because they’re already familiar with it, running other applications and storing their data there.

“It’s a question of velocity. How quickly can these companies move to develop these generative AI applications is driven by starting first on the data they have in AWS and using compute and machine learning tools that we provide,” explained Mai-Lan Tomsen Bukovec, VP of technology at AWS.

AWS is the world’s biggest cloud computing provider, with 40% of the market share in 2022, according to technology industry researcher Gartner. Although operating income has been down year-over-year for three quarters in a row, AWS still accounted for 70% of Amazon’s overall $7.7 billion operating profit in the second quarter. AWS’ operating margins have historically been far wider than those at Google Cloud.

AWS also has a growing portfolio of developer tools focused on generative AI.

“Let’s rewind the clock even before ChatGPT. It’s not like after that happened, suddenly we hurried and came up with a plan because you can’t engineer a chip in that quick a time, let alone you can’t build a Bedrock service in a matter of 2 to 3 months,” said Swami Sivasubramanian, AWS’ VP of database, analytics and machine learning.

Bedrock gives AWS customers access to large language models made by Anthropic, Stability AI, AI21 Labs and Amazon’s own Titan.

“We don’t believe that one model is going to rule the world, and we want our customers to have the state-of-the-art models from multiple providers because they are going to pick the right tool for the right job,” Sivasubramanian said.

An Amazon employee works on custom AI chips, in a jacket branded with AWS’ chip Inferentia, at the AWS chip lab in Austin, Texas, on July 25, 2023.

Katie Tarasov

One of Amazon’s newest AI offerings is AWS HealthScribe, a service unveiled in July to help doctors draft patient visit summaries using generative AI. Amazon also has SageMaker, a machine learning hub that offers algorithms, models and more. 

Another big tool is coding companion CodeWhisperer, which Amazon said has enabled developers to complete tasks 57% faster on average. Last year, Microsoft also reported productivity boosts from its coding companion, GitHub Copilot. 

In June, AWS announced a $100 million generative AI innovation “center.” 

“We have so many customers who are saying, ‘I want to do generative AI,’ but they don’t necessarily know what that means for them in the context of their own businesses. And so we’re going to bring in solutions architects and engineers and strategists and data scientists to work with them one on one,” AWS CEO Selipsky said.

Although so far AWS has focused largely on tools instead of building a competitor to ChatGPT, a recently leaked internal email shows Amazon CEO Andy Jassy is directly overseeing a new central team building out expansive large language models, too.

In the second-quarter earnings call, Jassy said a “very significant amount” of AWS business is now driven by AI and more than 20 machine learning services it offers. Some examples of customers include Philips, 3M, Old Mutual and HSBC. 

The explosive growth in AI has come with a flurry of security concerns from companies worried that employees are putting proprietary information into the training data used by public large language models.

“I can’t tell you how many Fortune 500 companies I’ve talked to who have banned ChatGPT. So with our approach to generative AI and our Bedrock service, anything you do, any model you use through Bedrock will be in your own isolated virtual private cloud environment. It’ll be encrypted, it’ll have the same AWS access controls,” Selipsky said.

For now, Amazon is only accelerating its push into generative AI, telling CNBC that “over 100,000” customers are using machine learning on AWS today. Although that’s a small percentage of AWS’s millions of customers, analysts say that could change.

“What we are not seeing is enterprises saying, ‘Oh, wait a minute, Microsoft is so ahead in generative AI, let’s just go out and let’s switch our infrastructure strategies, migrate everything to Microsoft.’ Dekate said. “If you’re already an Amazon customer, chances are you’re likely going to explore Amazon ecosystems quite extensively.”

— CNBC’s Jordan Novet contributed to this report.

CORRECTION: This article has been updated to reflect Inferentia as the chip used for machine learning inference.

Continue Reading

Technology

Musk says he does not support a merger between Tesla and xAI but backs investment

Published

on

By

Musk says he does not support a merger between Tesla and xAI but backs investment

Elon musk and the xAI logo.

Vincent Feuray | Afp | Getty Images

Elon Musk on Monday said he does not support a merger between xAI and Tesla, as questions swirl over the future relationship of the electric automaker and artificial intelligence company.

X account @BullStreetBets_ posted an open question to Tesla investors on the social media site asking if they support a merger between Tesla and xAI. Musk responded with “No.”

The statement comes as the tech billionaire contemplates the future relationship between his multiple businesses.

Overnight, Musk suggested that Tesla will hold a shareholder vote at an unspecified time on whether the automaker should invest in xAI, the billionaire’s company that develops the controversial Grok AI chatbot.

Last year, Musk asked his followers in an poll on social media platform X whether Tesla should invest $5 billion into xAI. The majority voted “yes” at the time.

Musk has looked to bring his various businesses closer together. In March, Musk merged xAI and X together in a deal that valued the artificial intelligence company at $80 billion and the social media company at $33 billion.

Musk also said last week that xAI’s chatbot Grok will be available in Tesla vehicles. The chatbot has come under criticism recently, after praising Adolf Hitler and posting a barrage of antisemitic comments.

CNBC’s Samantha Subin contributed to this report.

Continue Reading

Technology

Alibaba-backed Moonshot releases new Kimi AI model that beats ChatGPT, Claude in coding — and it costs less

Published

on

By

Alibaba-backed Moonshot releases new Kimi AI model that beats ChatGPT, Claude in coding — and it costs less

An AI sign at the MWC Shanghai tech show on June 19, 2025.

Bloomberg | Bloomberg | Getty Images

BEIJING — The latest Chinese generative artificial intelligence model to take on OpenAI’s ChatGPT is offering coding capabilities — at a lower price.

Alibaba-backed startup Moonshot released on late Friday night its Kimi K2 model: a low-cost, open source large language model — the two factors that underpinned China-based DeepSeek’s industry disruption in January. Open-source technology provides source code access for free, an approach that few U.S. tech giants have taken, other than Meta and Google to some extent.

Coincidentally, OpenAI CEO Sam Altman announced early Saturday that there would be an indefinite delay of its first open-source model yet again due to safety concerns. OpenAI did not immediately respond to a CNBC request for comment on Kimi K2.

Rethinking the AI coding payoff

One of Kimi K2’s strengths is in writing computer code for applications, an area in which businesses see potential to reduce or replace staff with generative AI. OpenAI’s U.S. rival Anthropic focused on coding with its Claude Opus 4 model released in late May.

In its release announcement on social media platforms X and GitHub, Moonshot claimed Kimi K2 surpassed Claude Opus 4 on two benchmarks, and had better overall performance than OpenAI’s coding-focused GPT-4.1 model, based on several industry metrics.

“No doubt [Kimi K2 is] a globally competitive model, and it’s open sourced,” Wei Sun, principal analyst in artificial intelligence at Counterpoint, said in an email Monday.

Cheaper option

“On top of that, it has lower token costs, making it attractive for large-scale or budget-sensitive deployments,” she said.

The new K2 model is available via Kimi’s app and browser interface for free unlike ChatGPT or Claude, which charge monthly subscriptions for their latest AI models.

Kimi is also only charging 15 cents for every 1 million input tokens, and $2.50 per 1 million output tokens, according to its website. Tokens are a way of measuring data for AI model processing.

In contrast, Claude Opus 4 charges 100 times more for input — $15 per million tokens — and 30 times more for output — $75 per million tokens. Meanwhile, for every one million tokens, GPT-4.1 charges $2 for input and $8 for output.

Moonshot AI said on GitHub that developers can use K2 however they wish, with the only requirement that they display “Kimi K2” on the user interface if the commercial product or service has more than 100 million monthly active users, or makes the equivalent of $20 million in monthly revenue.

Hot AI market

Initial reviews of K2 on both English and Chinese social media have largely been positive, although there are some reports of hallucinations, a prevalent issue in generative AI, in which the models make up information.

Still, K2 is “the first model I feel comfortable using in production since Claude 3.5 Sonnet,” Pietro Schirano, founder of startup MagicPath that offers AI tools for design, said in a post on X.

Moonshot has open sourced some of its prior AI models. The company’s chatbot surged in popularity early last year as China’s alternative to ChatGPT, which isn’t officially available in the country. But similar chatbots from ByteDance and Tencent have since crowded the market, while tech giant Baidu has revamped its core search engine with AI tools.

Kimi’s latest AI release comes as investors eye Chinese alternatives to U.S. tech in the global AI competition.

Still, despite the excitement about DeepSeek, the privately-held company has yet to announce a major upgrade to its R1 and V3 model. Meanwhile, Manus AI, a Chinese startup that emerged earlier this year as another DeepSeek-type upstart, has relocated its headquarters to Singapore.

Over in the U.S., OpenAI also has yet to reveal GPT-5.

Work on GPT-5 may be taking up engineering resources, preventing OpenAI from progressing on its open-source model, Counterpoint’s Sun said, adding that it’s challenging to release a powerful open-source model without undermining the competitive advantage of a proprietary model.

Grok 4 competitor

Kimi K2 is not the company’s only recent release. Moonshot launched a Kimi research model last month and claimed it matched Google’s Gemini Deep Research ‘s 26.9 score and beat OpenAI’s version on a benchmark called “Humanity’s Last Exam.”

The Kimi research model even got a mention last week during Elon Musk’s xAI release of Grok 4 — which scored 25.4 on its own on the “Humanity’s Last Exam” benchmark, but attained a 44.4 score when allowed to use a variety of AI tools and web search.

“Kimi-Researcher represents a paradigm shift in agentic AI,” said Winston Ma, adjunct professor at NYU School of Law. He was referring to AI’s capability of simultaneously making several decisions on its own to complete a complex task.

“Instead of merely generating fluent responses, it demonstrates autonomous reasoning at an expert level — the kind of complex cognitive work previously missing from LLMs,” Ma said. He is also author of “The Digital War: How China’s Tech Power Shapes the Future of AI, Blockchain and Cyberspace.”

— CNBC’s Victoria Yeo contributed to this report.

Continue Reading

Technology

Nvidia CEO downplays U.S. fears that China’s military will use his firm’s chips

Published

on

By

Nvidia CEO downplays U.S. fears that China's military will use his firm's chips

Co-founder and chief executive officer of Nvidia Corp., Jensen Huang attends the 9th edition of the VivaTech trade show in Paris on June 11, 2025.

Chesnot | Getty Images Entertainment | Getty Images

Nvidia CEO Jensen Huang has downplayed U.S. fears that his firm’s chips will aid the Chinese military, days ahead of another trip to the country as he attempts to walk a tightrope between Washington and Beijing. 

In an interview with CNN aired Sunday, Huang said “we don’t have to worry about” China’s military using U.S.-made technology because “they simply can’t rely on it.”

“It could be limited at any time; not to mention, there’s plenty of computing capacity in China already,” Huang said. “They don’t need Nvidia’s chips, certainly, or American tech stacks in order to build their military,” he added.

The comments were made in reference to years of bipartisan U.S. policy that placed restrictions on semiconductor companies, prohibiting them from selling their most advanced artificial intelligence chips to clients in China. 

Huang also repeated past criticisms of the policies, arguing that the tactic of export controls has been counterproductive to the ultimate goal of U.S. tech leadership. 

“We want the American tech stack to be the global standard … in order for us to do that, we have to be in search of all the AI developers in the world,” Huang said, adding that half of the world’s AI developers are in China. 

‘The Nvidia Way’ author Tae Kim: Jensen Huang always positions Nvidia ahead of the next big trend

That means for America to be an AI leader, U.S. technology has to be available to all markets, including China, he added.

Washington’s latest restrictions on Nvidia’s sales to China were implemented in April and are expected to result in billions in losses for the company. In May, Huang said chip restrictions had already cut Nvidia’s China market share nearly in half.

Huang’s CNN interview came just days before he travels to China for his second trip to the country this year, and as Nvidia is reportedly working on another chip that is compliant with the latest export controls.

Last week, the Nvidia CEO met with U.S. President Donald Trump, and was warned by U.S. lawmakers not to meet with companies connected to China’s military or intelligence bodies, or entities named on America’s restricted export list.

According to Daniel Newman, CEO of tech advisory firm The Futurum Group, Huang’s CNN interview exemplifies how Huang has been threading a needle between Washington and Beijing as it tries to maintain maximum market access.

“He needs to walk a proverbial tightrope to make sure that he doesn’t rattle the Trump administration,” Newman said, adding that he also wants to be in a position for China to invest in Nvidia technology if and when the policy provides a better climate to do so.

But that’s not to say that his downplaying of Washington’s concerns is valid, according to Newman. “I think it’s hard to completely accept the idea that China couldn’t use Nvidia’s most advanced technologies for military use.”

He added that he would expect Nvidia’s technology to be at the core of any country’s AI training, including for use in the development of advanced weaponry. 

A U.S. official told Reuters last month that China’s large language model startup DeepSeek — which says it used Nvidia chips to train its models — was supporting China’s military and intelligence operations. 

On Sunday, Huang acknowledged there were concerns about DeepSeek’s open-source R1 reasoning model being trained in China but said that there was no evidence that it presents dangers for that reason alone.

Huang complimented the R1 reasoning model, calling it “revolutionary,” and said its open-source nature has empowered startup companies, new industries, and countries to be able to engage in AI. 

“The fact of the matter is, [China and the U.S.] are competitors, but we are highly interdependent, and to the extent that we can compete and both aspire to win, it is fine to respect our competitors,” he concluded. 

Continue Reading

Trending