Silhouettes of laptop and mobile device users are seen next to a screen projection of the YouTube logo.
Dado Ruvic | Reuters
Google is using its expansive library of YouTube videos to train its artificial intelligence models, including Gemini and the Veo 3 video and audio generator, CNBC has learned.
The tech company is turning to its catalog of 20 billion YouTube videos to train these new-age AI tools, according to a person who was not authorized to speak publicly about the matter. Google confirmed to CNBC that it relies on its vault of YouTube videos to train its AI models, but the company said it only uses a subset of its videos for the training and that it honors specific agreements with creators and media companies.
“We’ve always used YouTube content to make our products better, and this hasn’t changed with the advent of AI,” said a YouTube spokesperson in a statement. “We also recognize the need for guardrails, which is why we’ve invested in robust protections that allow creators to protect their image and likeness in the AI era — something we’re committed to continuing.”
Such use of YouTube videos has the potential to lead to an intellectual property crisis for creators and media companies, experts said.
While YouTube says it has shared this information previously, experts who spoke with CNBC said it’s not widely understood by creators and media organizations that Google is training its AI models using its video library.
YouTube didn’t say how many of the 20 billion videos on its platform or which ones are used for AI training. But given the platform’s scale, training on just 1% of the catalog would amount to 2.3 billion minutes of content, which experts say is more than 40 times the training data used by competing AI models.
The company shared in a blog post published in September that YouTube content could be used to “improve the product experience … including through machine learning and AI applications.” Users who have uploaded content to the service have no way of opting out of letting Google train on their videos.
“It’s plausible that they’re taking data from a lot of creators that have spent a lot of time and energy and their own thought to put into these videos,” said Luke Arrigoni, CEO of Loti, a company that works to protect digital identity for creators. “It’s helping the Veo 3 model make a synthetic version, a poor facsimile, of these creators. That’s not necessarily fair to them.”
CNBC spoke with multiple leading creators and IP professionals, none were aware or had been informed by YouTube that their content could be used to train Google’s AI models.
Google DeepMind Veo 3.
Courtesy: Google DeepMind
The revelation that YouTube is training on its users’ videos is noteworthy after Google in May announced Veo 3, one of the most advanced AI video generators on the market. In its unveiling, Google showcased cinematic-level video sequences, including a scene of an old man on a boat and another showing Pixar-like animals talking with one another. The entirety of the scenes, both the visual and the audio, were entirely AI generated.
According to YouTube, an average of 20 million videos are uploaded to the platform each day by independent creators by nearly every major media company. Many creators say they are now concerned they may be unknowingly helping to train a system that could eventually compete with or replace them.
“It doesn’t hurt their competitive advantage at all to tell people what kind of videos they train on and how many they trained on,” Arrigoni said. “The only thing that it would really impact would be their relationship to creators.”
Even if Veo 3’s final output does not directly replicate existing work, the generated content fuels commercial tools that could compete with the creators who made the training data possible, all without credit, consent or compensation, experts said.
When uploading a video to the platform, the user is agreeing that YouTube has a broad license to the content.
“By providing Content to the Service, you grant to YouTube a worldwide, non-exclusive, royalty-free, sublicensable and transferable license to use that Content,” the terms of service read.
“We’ve seen a growing number of creators discover fake versions of themselves circulating across platforms — new tools like Veo 3 are only going to accelerate the trend,” said Dan Neely, CEO of Vermillio, which helps individuals protect their likeness from being misused and also facilitates secure licensing of authorized content.
Neely’s company has challenged AI platforms for generating content that allegedly infringes on its clients’ intellectual property, both individual and corporate. Neely says that although YouTube has the right to use this content, many of the content creators who post on the platform are unaware that their videos are being used to train video-generating AI software.
Vermillio uses a proprietary tool called Trace ID to asses whether an AI-generated video has significant overlap with a human-created video. Trace ID assigns scores on a scale of zero to 100. Any score over 10 for a video with audio is considered meaningful, Neely said.
A video from YouTube creator Brodie Moss closely matched content generated by Veo 3. Using Vermillio’s Trace ID tool, the system attributed a score of 71 to the original video with the audio alone scoring over 90.
Some creators told CNBC they welcome the opportunity to use Veo 3, even if it may have been trained on their content.
“I try to treat it as friendly competition more so than these are adversaries,” said Sam Beres, a creator with 10 million subscribers on YouTube. “I’m trying to do things positively because it is the inevitable —but it’s kind of an exciting inevitable.”
Google includes an indemnification clause for its generative AI products, including Veo, which means that if a user faces a copyright challenge over AI-generated content, Google will take on legal responsibility and cover the associated costs.
YouTube announced a partnership with Creative Artists Agency in December to develop access for top talent to identify and manage AI-generated content that features their likeness. YouTube also has a tool for creators to request a video to be taken down if they believe it abuses their likeness.
However, Arrigoni said that the tool hasn’t been reliable for his clients.
YouTube also allows creators to opt out of third party training from select AI companies including Amazon, Apple and Nvidia, but users are not able to stop Google from training for its own models.
The Walt Disney Company and Universal filed a joint lawsuit last Wednesday against the AI image generator Midjourney, alleging copyright infringement, the first lawsuit of its kind out of Hollywood.
“The people who are losing are the artists and the creators and the teenagers whose lives are upended,” said Sen. Josh Hawley, R-Mo., in May at a Senate hearing about the use of AI to replicate the likeness of humans. “We’ve got to give individuals powerful enforceable rights and their images in their property in their lives back again or this is just never going to stop.”
Disclosure: Universal is part of NBCUniversal, the parent company of CNBC.
Spotify, Reddit and X have all implemented age assurance systems to prevent children from being exposed to inappropriate content.
STR | Nurphoto via Getty Images
The global online safety movement has paved the way for a number of artificial intelligence-powered products designed to keep kids away from potentially harmful things on the internet.
In the U.K., a new piece of legislation called the Online Safety Act imposes a duty of care on tech companies to protect children from age-inappropriate material, hate speech, bullying, fraud, and child sexual abuse material (CSAM). Companies can face fines as high as 10% of their global annual revenue for breaches.
Further afield, landmark regulations aimed at keeping kids safer online are swiftly making their way through the U.S. Congress. One bill, known as the Kids Online Safety Act, would make social media platforms liable for preventing their products from harming children — similar to the Online Safety Act in the U.K.
This push from regulators is increasingly causing something of a rethink at several major tech players. Pornhub and other online pornography giants are blocking all users from accessing their sites unless they go through an age verification system.
Porn sites haven’t been alone in taking action to verify users ages, though. Spotify, Reddit and X have all implemented age assurance systems to prevent children from being exposed to sexually explicit or inappropriate materials.
Such regulatory measures have been met with criticisms from the tech industry — not least due to concerns that they may infringe internet users’ privacy.
Digital ID tech flourishing
At the heart of all these age verification measures is one company: Yoti.
Yoti produces technology that captures selfies and uses artificial intelligence to verify someone’s age based on their facial features. The firm says its AI algorithm, which has been trained on millions of faces, can estimate the age of 13 to 24-year-olds within two years of accuracy.
The firm has previously partnered with the U.K.’s Post Office and is hoping to capitalize on the broader push for government-issued digital ID cards in the U.K. Yoti is not alone in the identity verification software space — other players include Entrust, Persona and iProov. However, the company has been the most prominent provider of age assurance services under the new U.K. regime.
“There is a race on for child safety technology and service providers to earn trust and confidence,” Pete Kenyon, a partner at law firm Cripps, told CNBC. “The new requirements have undoubtedly created a new marketplace and providers are scrambling to make their mark.”
Yet the rise of digital identification methods has also led to concerns over privacy infringements and possible data breaches.
“Substantial privacy issues arise with this technology being used,” said Kenyon. “Trust is key and will only be earned by the use of stringent and effective technical and governance procedures adopted in order to keep personal data safe.”
Read more CNBC tech news
Rani Govender, policy manager for child safety online at British child protection charity NSPCC, said that the technology “already exists” to authenticate users without compromising their privacy.
“Tech companies must make deliberate, ethical choices by choosing solutions that protect children from harm without compromising the privacy of users,” she told CNBC. “The best technology doesn’t just tick boxes; it builds trust.”
Child-safe smartphones
The wave of new tech emerging to prevent children from being exposed to online harms isn’t just limited to software.
Earlier this month, Finnish phone maker HMD Global launched a new smartphone called the Fusion X1, which uses AI to stop kids from filming or sharing nude content or viewing sexually explicit images from the camera, screen and across all apps.
The phone uses technology developed by SafeToNet, a British cybersecurity firm focused on child safety.
Finnish phone maker HMD Global’s new smartphone uses AI to prevent children from being exposed nude or sexually explicit images.
HMD Global
“We believe more needs to be done in this space,” James Robinson, vice president of family vertical at HMD, told CNBC. He stressed that HMD came up with the concept for children’s devices prior to the Online Safety Act entering into force, but noted it was “great to see the government taking greater steps.”
The release of HMD’s child-friendly phone follows heightened momentum in the “smartphone-free” movement, which encourages parents to avoid letting their children own a smartphone.
Going forward, the NSPCC’s Govender says that child safety will become a significant priority for digital behemoths such as Google and Meta.
The tech giants have for years been accused of worsening mental health in children and teens due to the rise of online bullying and social media addiction. They in return argue they’ve taken steps to address these issues through increased parental controls and privacy features.
“For years, tech giants have stood by while harmful and illegal content spread across their platforms, leaving young people exposed and vulnerable,” she told CNBC. “That era of neglect must end.”
A banner for Snowflake Inc. is displayed at the New York Stock Exchange to celebrate the company’s initial public offering on Sept. 16, 2020.
Brendan McDermid | Reuters
MongoDB’s stock just closed out its best week on record, leading a rally in enterprise technology companies that are seeing tailwinds from the artificial intelligence boom.
In addition to MongoDB’s 44% rally, Pure Storage soared 33%, its second-sharpest gain ever, while Snowflake jumped 21%. Autodesk rose 8.4%.
Since generative AI started taking off in late 2022 following the launch of OpenAI’s ChatGPT, the big winners have been Nvidia, for its graphics processing units, as well as the cloud vendors like Microsoft, Google and Oracle, and companies packaging and selling GPUs, such as Dell and Super Micro Computer.
For many cloud software vendors and other enterprise tech companies, Wall Street has been waiting to see if AI will be a boon to their business, or if it might displace it.
Quarterly results this week and commentary from company executives may have eased some of those concerns, showing that the financial benefits of AI are making their way downstream.
MongoDB CEO Dev Ittycheria told CNBC’s “Squawk Box” on Wednesday that enterprise rollouts of AI services are happening, but slowly.
“You start to see deployments of agents to automate back office, maybe automate sales and marketing, but it’s still not yet kind of full force in the enterprise,” Ittycheria said. “People want to see some wins before they deploy more investment.”
Revenue at MongoDB, which sells cloud database services, rose 24% from a year earlier to $591 million, sailing past the $556 million average analyst estimate, according to LSEG. Earnings also exceeded expectations, as did the company’s full-year forecast for profit and revenue.
MongoDB said in its earnings report that it’s added more than 5,000 customers year-to-date, “the highest ever in the first half of the year.”
“We think that’s a good sign of future growth because a lot of these companies are AI native companies who are coming to MongoDB to run their business,” Ittycheria said.
Pure Storage enjoyed a record pop on Thursday, when the stock jumped 32% to an all-time high.
The data storage management vendor reported quarterly results that topped estimates and lifted its guidance for the year. But what’s exciting investors the most is early returns from Pure’s recent contract with Meta. Pure will help the social media company manage its massive storage needs efficiently with the demands of AI.
Pure said it started recognizing revenue from its Meta deployments in the second quarter, and finance chief Tarek Robbiati said on the earnings call that the company is seeing “increased interest from other hyperscalers” looking to replace their traditional storage with Pure’s technology.
‘Banger of a report’
Reports from MongoDB and Pure landed the same week that Nvidia announced quarterly earnings, and said revenue soared 56% from a year earlier, marking a ninth-straight quarter of growth in excess of 50%.
Nvidia has emerged as the world’s most-valuable company by selling advanced AI processors to all of the infrastructure providers and model developers.
While growth at Nvidia has slowed from its triple-digit rate in 2023 and 2024, it’s still expanding at a much faster pace than its megacap peers, indicating that there’s no end in sight when it comes to the expansive AI buildouts.
“It was a banger of a report,” said Brad Gerstner CEO of Altimeter Capital, in an interview with CNBC’s “Halftime Report” on Thursday. “This company is accelerating at scale.”
Read more CNBC tech news
Data analytics vendor Snowflake talked up its Snowflake AI data cloud in its quarterly earnings report on Wednesday.
Snowflake shares popped 20% following better-than-expected earnings and revenue. The company also boosted its guidance for the year for product revenue, and said it has more than 6,100 customers using Snowflake AI, up from 5,200 during the prior quarter.
“Our progress with AI has been remarkable,” Snowflake CEO Sridhar Ramaswamy said on the earnings call. “Today, AI is a core reason why customers are choosing Snowflake, influencing nearly 50% of new logos won in Q2.”
Autodesk, founded in 1982, has been around much longer than MongoDB, Pure Storage or Snowflake. The company is known for its AutoCAD software used in architecture and construction.
The company has underperformed the broader tech sector of late, and last year activist investor Starboard Value jumped into the stock to push for improvements in operations and financial performance, including cost cuts. In February, Autodesk slashed 9% of its workforce, and two months later the company settled with Starboard, adding two newcomers to its board.
The stock is still trailing the Nasdaq for the year, but climbed 9.1% on Friday after Autodesk reported results that exceeded Wall Street estimates and increased its full-year revenue guidance.
Last year, Autodesk introduced Project Bernini to develop new AI models and create what it calls “AI‑driven CAD engines.”
On Thursday’s earnings call, CEO Andrew Anagnost was asked what he’s most excited about across his company’s product portfolio when it comes to AI.
Anagnost touted the ability of Autodesk to help customers simplify workflow across products and promoted the Autodesk Assistant as a way to enhance productivity through simple prompts.
He also addressed the elephant in the room: The existential threat that AI presents.
“AI may eat software,” he said, “but it’s not gonna eat Autodesk.”
Meta Platforms CEO Mark Zuckerberg departs after attending a Federal Trade Commission trial that could force the company to unwind its acquisitions of messaging platform WhatsApp and image-sharing app Instagram, at U.S. District Court in Washington, D.C., U.S., April 15, 2025.
Nathan Howard | Reuters
Meta on Friday said it is making temporary changes to its artificial intelligence chatbot policies related to teenagers as lawmakers voice concerns about safety and inappropriate conversations.
The social media giant is now training its AI chatbots so that they do not generate responses to teenagers about subjects like self-harm, suicide, disordered eating and avoid potentially inappropriate romantic conversations, a Meta spokesperson confirmed.
The company said AI chatbots will instead point teenagers to expert resources when appropriate.
“As our community grows and technology evolves, we’re continually learning about how young people may interact with these tools and strengthening our protections accordingly,” the company said in a statement.
Additionally, teenage users of Meta apps like Facebook and Instagram will only be able to access certain AI chatbots intended for educational and skill-development purposes.
The company said it’s unclear how long these temporary modifications will last, but they will begin rolling out over the next few weeks across the company’s apps in English-speaking countries. The “interim changes” are part of the company’s longer-term measures over teen safety.
Last week, Sen. Josh Hawley, R-Mo., said that he was launching an investigation into Meta following a Reuters report about the company permitting its AI chatbots to engage in “romantic” and “sensual” conversations with teens and children.
Read more CNBC tech news
The Reuters report described an internal Meta document that detailed permissible AI chatbot behaviors that staff and contract workers should take into account when developing and training the software.
In one example, the document cited by Reuters said that a chatbot would be allowed to have a romantic conversation with an eight-year-old and could tell the minor that “every inch of you is a masterpiece – a treasure I cherish deeply.”
A Meta spokesperson told Reuters at the time that “The examples and notes in question were and are erroneous and inconsistent with our policies, and have been removed.”
Most recently, the nonprofit advocacy group Common Sense Media released a risk assessment of Meta AI on Thursday and said that it should not be used by anyone under the age of 18, because the “system actively participates in planning dangerous activities, while dismissing legitimate requests for support,” the nonprofit said in a statement.
“This is not a system that needs improvement. It’s a system that needs to be completely rebuilt with safety as the number-one priority, not an afterthought,” said Common Sense Media CEO James Steyer in a statement. “No teen should use Meta AI until its fundamental safety failures are addressed.”
A separate Reuters report published on Friday found “dozens” of flirty AI chatbots based on celebrities like Taylor Swift, Scarlett Johansson, Anne Hathaway and Selena Gomez on Facebook, Instagram and WhatsApp.
The report said that when prompted, the AI chatbots would generate “photorealistic images of their namesakes posing in bathtubs or dressed in lingerie with their legs spread.”
A Meta spokesperson told CNBC in a statement that “the AI-generated imagery of public figures in compromising poses violates our rules.”
“Like others, we permit the generation of images containing public figures, but our policies are intended to prohibit nude, intimate or sexually suggestive imagery,” the Meta spokesperson said. “Meta’s AI Studio rules prohibit the direct impersonation of public figures.”