Inside a sprawling lab at Google headquarters in Mountain View, California, hundreds of server racks hum across several aisles, performing tasks far less ubiquitous than running the world’s dominant search engine or executing workloads for Google Cloud’s millions of customers.
Instead, they’re running tests on Google’s own microchips, called Tensor Processing Units, or TPUs.
Originally trained for internal workloads, Google’s TPUs have been available to cloud customers since 2018. In July, Apple revealed it uses TPUs to train AI models underpinning Apple Intelligence. Google also relies on TPUs to train and run its Gemini chatbot.
“The world sort of has this fundamental belief that all AI, large language models, are being trained on Nvidia, and of course Nvidia has the lion’s share of training volume. But Google took its own path here,” said Futurum Group CEO Daniel Newman. He’s been covering Google’s custom cloud chips since they launched in 2015.
Google was the first cloud provider to make custom AI chips. Three years later, Amazon Web Services announced its first cloud AI chip, Inferentia. Microsoft‘s first custom AI chip, Maia, wasn’t announced until the end of 2023.
But being first in AI chips hasn’t translated to a top spot in the overall rat race of generative AI. Google’s faced criticism for botched product releases, and Gemini came out more than a year after OpenAI’s ChatGPT.
Google Cloud, however, has gained momentum due in part to AI offerings. Google parent company Alphabet reported cloud revenue rose 29% in the most recent quarter, surpassing $10 billion in quarterly revenues for the first time.
“The AI cloud era has completely reordered the way companies are seen, and this silicon differentiation, the TPU itself, may be one of the biggest reasons that Google went from the third cloud to being seen truly on parity, and in some eyes, maybe even ahead of the other two clouds for its AI prowess,” Newman said.
‘A simple but powerful thought experiment’
In July, CNBC got the first on-camera tour of Google’s chip lab and sat down with the head of custom cloud chips, Amin Vahdat. He was already at Google when it first toyed with the idea of making chips in 2014.
Amin Vahdat, VP of Machine Learning, Systems and Cloud AI at Google, holds up TPU Version 4 at Google headquarters in Mountain View, California, on July 23, 2024.
Marc Ganley
“It all started with a simple but powerful thought experiment,” Vahdat said. “A number of leads at the company asked the question: What would happen if Google users wanted to interact with Google via voice for just 30 seconds a day? And how much compute power would we need to support our users?”
“We realized that we could build custom hardware, not general purpose hardware, but custom hardware — Tensor Processing Units in this case — to support that much, much more efficiently. In fact, a factor of 100 more efficiently than it would have been otherwise,” Vahdat said.
Google data centers still rely on general-purpose central processing units, or CPUs, and Nvidia’s graphics processing units, or GPUs. Google’s TPUs are a different type of chip called an application-specific integrated circuit, or ASIC, which are custom-built for specific purposes. The TPU is focused on AI. Google makes another ASIC focused on video called a Video Coding Unit.
The TPU, however, is what set Google apart. It was the first of its kind when it launched in 2015. Google TPUs still dominate among custom cloud AI accelerators, with 58% of the market share, according to The Futurum Group.
Google coined the term based on the algebraic term “tensor,” referring to the large-scale matrix multiplications that happen rapidly for advanced AI applications.
With the second TPU release in 2018, Google expanded the focus from inference to training and made them available for its cloud customers to run workloads, alongside market-leading chips such as Nvidia’s GPUs.
“If you’re using GPUs, they’re more programmable, they’re more flexible. But they’ve been in tight supply,” said Stacy Rasgon, senior analyst covering semiconductors at Bernstein Research.
The AI boom has sent Nvidia’s stock through the roof, catapulting the chipmaker to a $3 trillion market cap in June, surpassing Alphabet and jockeying with Apple and Microsoft for position as the world’s most valuable public company.
“Being candid, these specialty AI accelerators aren’t nearly as flexible or as powerful as Nvidia’s platform, and that is what the market is also waiting to see: Can anyone play in that space?” Newman said.
Now that we know Apple’s using Google’s TPUs to train its AI models, the real test will come as those full AI features roll out on iPhones and Macs next year.
Broadcom and TSMC
It’s no small feat to develop alternatives to Nvidia’s AI engines. Google’s sixth generation TPU, called Trillium, is set to come out later this year.
Google showed CNBC the sixth version of its TPU, Trillium, in Mountain View, California, on July 23, 2024. Trillium is set to come out later in 2024.
Marc Ganley
“It’s expensive. You need a lot of scale,” Rasgon said. “And so it’s not something that everybody can do. But these hyperscalers, they’ve got the scale and the money and the resources to go down that path.”
The process is so complex and costly that even the hyperscalers can’t do it alone. Since the first TPU, Google’s partnered with Broadcom, a chip developer that also helps Meta design its AI chips. Broadcom says it’s spent more than $3 billion to make these partnerships happen.
“AI chips — they’re very complex. There’s lots of things on there. So Google brings the compute,” Rasgon said. “Broadcom does all the peripheral stuff. They do the I/O and the SerDes, all of the different pieces that go around that compute. They also do the packaging.”
Then the final design is sent off for manufacturing at a fabrication plant, or fab — primarily those owned by the world’s largest chipmaker, Taiwan Semiconductor Manufacturing Company, which makes 92% of the world’s most advanced semiconductors.
When asked if Google has any safeguards in place should the worst happen in the geopolitical sphere between China and Taiwan, Vahdat said, “It’s certainly something that we prepare for and we think about as well, but we’re hopeful that actually it’s not something that we’re going to have to trigger.”
Protecting against those risks is the primary reason the White House is handing out $52 billion in CHIPS Act funding to companies building fabs in the U.S. — with the biggest portions going to Intel, TSMC, and Samsung to date.
Processors and power
Google showed CNBC its new Axion CPU,
Marc Ganley
“Now we’re able to bring in that last piece of the puzzle, the CPU,” Vahdat said. “And so a lot of our internal services, whether it’s BigQuery, whether it’s Spanner, YouTube advertising and more are running on Axion.”
Google is late to the CPU game. Amazon launched its Graviton processor in 2018. Alibaba launched its server chip in 2021. Microsoft announced its CPU in November.
When asked why Google didn’t make a CPU sooner, Vahdat said, “Our focus has been on where we can deliver the most value for our customers, and there it has been starting with the TPU, our video coding units, our networking. We really thought that the time was now.”
All these processors from non-chipmakers, including Google’s, are made possible by Arm chip architecture — a more customizable, power-efficient alternative that’s gaining traction over the traditional x86 model from Intel and AMD. Power efficiency is crucial because, by 2027, AI servers are projected to use up as much power every year as a country like Argentina. Google’s latest environmental report showed emissions rose nearly 50% from 2019 to 2023 partly due to data center growth for powering AI.
“Without having the efficiency of these chips, the numbers could have wound up in a very different place,” Vahdat said. “We remain committed to actually driving these numbers in terms of carbon emissions from our infrastructure, 24/7, driving it toward zero.”
It takes a massive amount of water to cool the servers that train and run AI. That’s why Google’s third-generation TPU started using direct-to-chip cooling, which uses far less water. That’s also how Nvidia’s cooling its latest Blackwell GPUs.
Despite challenges, from geopolitics to power and water, Google is committed to its generative AI tools and making its own chips.
“I’ve never seen anything like this and no sign of it slowing down quite yet,” Vahdat said. “And hardware is going to play a really important part there.”
OpenAI has been awarded a $200 million contract to provide the U.S. Defense Department with artificial intelligence tools.
The department announced the one-year contract on Monday, months after OpenAI said it would collaborate with defense technology startup Anduril to deploy advanced AI systems for “national security missions.”
“Under this award, the performer will develop prototype frontier AI capabilities to address critical national security challenges in both warfighting and enterprise domains,” the Defense Department said. It’s the first contract with OpenAI listed on the Department of Defense’s website.
Anduril received a $100 million defense contract in December. Weeks earlier, OpenAI rival Anthropic said it would work with Palantir and Amazon to supply its AI models to U.S. defense and intelligence agencies.
Sam Altman, OpenAI’s co-founder and CEO, said in a discussion with OpenAI board member and former National Security Agency leader Paul Nakasone at a Vanderbilt University event in April that “we have to and are proud to and really want to engage in national security areas.”
OpenAI did not immediately respond to a request for comment.
The Defense Department specified that the contract is with OpenAI Public Sector LLC, and that the work will mostly occur in the National Capital Region, which encompasses Washington, D.C., and several nearby counties in Maryland and Virginia.
Meanwhile, OpenAI is working to build additional computing power in the U.S. In January, Altman appeared alongside President Donald Trump at the White House to announce the $500 billion Stargate project to build AI infrastructure in the U.S.
The new contract will represent a small portion of revenue at OpenAI, which is generating over $10 billion in annualized sales. In March, the company announced a $40 billion financing round at a $300 billion valuation.
In April, Microsoft, which supplies cloud infrastructure to OpenAI, said the U.S. Defense Information Systems Agency has authorized the use of the Azure OpenAI service with secret classified information.
A United Launch Alliance Atlas V rocket is shown on its launch pad carrying Amazon’s Project Kuiper internet network satellites as the vehicle is prepared for launch at the Cape Canaveral Space Force Station in Cape Canaveral, Florida, U.S., April 28, 2025.
Steve Nesius | Reuters
United Launch Alliance on Monday was forced to delay the second flight carrying a batch of Amazon‘s Project Kuiper internet satellites because of a problem with the rocket booster.
With roughly 30 minutes left in the countdown, ULA announced it was scrubbing the launch due to an issue with “an elevated purge temperature” within its Atlas V rocket’s booster engine. The company said it will provide a new launch date at a later point.
“Possible issue with a GN2 purge line that cannot be resolved inside the count,” ULA CEO Tory Bruno said in a post on Bluesky. “We will need to stand down for today. We’ll sort it and be back.”
The launch from Florida’s Space Coast had been set for last Friday, but was rescheduled to Monday at 1:25 p.m. ET due to inclement weather.
Read more CNBC tech news
Amazon in April successfully sent up 27 Kuiper internet satellites into low Earth orbit, a region of space that’s within 1,200 miles of the Earth’s surface. The second voyage will send “another 27 satellites into orbit, bringing our total constellation size to 54 satellites,” Amazon said in a blog post.
Kuiper is the latest entrant in the burgeoning satellite internet industry, which aims to beam high-speed internet to the ground from orbit. The industry is currently dominated by Elon Musk’s Space X, which operates Starlink. Other competitors include SoftBank-backed OneWeb and Viasat.
Amazon is targeting a constellation of more than 3,000 satellites. The company has to meet a Federal Communications Commission deadline to launch half of its total constellation, or 1,618 satellites, by July 2026.
Thomas Kurian, CEO of Google Cloud, speaks at a cloud computing conference held by the company in 2019.
Michael Short | Bloomberg | Getty Images
Google apologized for a major outage that the company said was caused by multiple layers of flawed recent updates.
The company released an incident report late on Friday that explained hours of downtime on Thursday. More than 70 Google cloud services stopped working properly across the globe, knocking down or disrupting dozens of third-party services, including Cloudflare, OpenAI and Shopify. Gmail, Google Calendar, Google Drive, Google Meet and other first-party products also malfunctioned.
“We deeply apologize for the impact this outage has had,” Google wrote in the incident report. “Google Cloud customers and their users trust their businesses to Google, and we will do better. We apologize for the impact this has had not only on our customers’ businesses and their users but also on the trust of our systems. We are committed to making improvements to help avoid outages like this moving forward.”
Thomas Kurian, CEO of Google’s cloud unit, also posted about the outage in an X post on Thursday, saying “we regret the disruption this caused our customers.”
Google in May added a new feature to its “quota policy checks” for evaluating automated incoming requests, but the new feature wasn’t immediately tested in real-world situations, the company wrote in the incident report. As a result, the company’s systems didn’t know how to properly handle data from the new feature, which included blank entries. Those blank entries were then sent out to all Google Cloud data center regions, which prompted the crashes, the company wrote.
Engineers figured out the issue in 10 minutes, according to the company. However, the entire incident went on for seven hours after that, with the crash leading to an overload in some larger regions.
As it released the feature, Google did not use feature flags, an increasingly common industry practice that allows for slow implementation to minimize impact if problems occur. Feature flags would have caught the issue before the feature became widely available, Google said.
Going forward, Google will change its architecture so if one system fails, it can still operate without crashing, the company said. Google said it will also audit all systems and improve its communications “both automated and human, so our customers get the information they need asap to react to issues.”