A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage as part of a video generated by OpenAI’s Sora AI model.
OpenAI
OpenAI, which burst into the mainstream last year thanks to the popularity of ChatGPT, is bringing its artificial intelligence technology to video.
The company on Thursday introduced Sora, its new generative AI model. Sora works similarly to OpenAI’s image-generation AI tool, DALL-E. A user types out a desired scene and Sora will return a high-definition video clip. Sora can also generate video clips inspired by still images, and extend existing videos or fill in missing frames.
Video could be the next frontier for generative AI now that chatbots and image generators have made their way into the consumer and business world. While the creative opportunities will excite AI enthusiasts, the new technologies present serious misinformation concerns as major political elections approach across the globe. The number of AI-generated deepfakes created has increased 900% year-over-year, according to data from Clarity, a machine learning firm.
With Sora, OpenAI is looking to compete with video-generation AI tools from companies like Meta and Google, which announced Lumiere last month. Similar AI tools are available from startups such as Stability AI, which has a product called Stable Video Diffusion. Amazon has also released Create with Alexa, a model specialized in generating prompt-based short-form animated children’s content.
Sora is currently limited to generating videos that are a minute long or less. OpenAI, backed by Microsoft, has made multimodality — the combining of text, image and video generation — a goal in its effort to offer a broader suite of AI models.
“The world is multimodal,” OpenAI COO Brad Lightcap told CNBC in November. “If you think about the way we as humans process the world and engage with the world, we see things, we hear things, we say things – the world is much bigger than text. So to us, it always felt incomplete for text and code to be the single modalities, the single interfaces that we could have to how powerful these models are and what they can do.”
Sora has thus far only been available to a small group of safety testers, or “red teamers,” who test the model for vulnerabilities in areas like misinformation and bias. The company hasn’t released any public demonstrations beyond 10 sample clips available on its website, and said its accompanying technical paper will be released later on Thursday.
OpenAI also said it’s building a “detection classifier” that can identify Sora-generated video clips, and that it plans to include certain metadata in its output that should help with identifying AI-generated content. It’s the same type of metadata that Meta is looking to use to identify AI-generated images this election year.
Sora is a diffusion AI model that, like ChatGPT, uses the Transformer architecture, introduced by Google researchers in a 2017 paper.
“Sora serves as a foundation for models that can understand and simulate the real world,” OpenAI wrote in its announcement.