OpenAI, an artificial intelligence (AI) company, introduced its inaugural text-to-video model, receiving a positive response. However, the company acknowledges that there is still room for improvement.
On February 15, OpenAI revealed Sora, its new generative AI model. Sora has the ability to generate detailed videos based on simple text prompts, continue existing videos, and even create scenes from still images.
According to a blog post on February 15, OpenAI claimed that Sora can generate high-resolution, movie-like scenes up to 1080p. These scenes can include multiple characters, specific types of motion, and accurate details of the subject and background.
Sora operates on a diffusion model, similar to OpenAI’s image-based predecessor, Dall-E 3. In this model, the AI generates a video or an image by starting with something that resembles “static noise” and gradually transforms it by removing the noise over several steps.
OpenAI stated that Sora builds on previous research from ChatGPT and Dall-E 3 models, which helps the model better represent user inputs.
The company admitted that Sora still has some weaknesses. It may struggle to accurately simulate the physics of a complex scene, often confusing cause and effect. Additionally, the model may mix up left and right or fail to follow precise descriptions of directions, resulting in confusion over spatial details.
Sora is currently only available to “red teamers” (cybersecurity researchers) to evaluate potential risks and harms. OpenAI is also seeking feedback from select designers, visual artists, and filmmakers to further improve the model.
In December 2023, a report from Stanford University raised concerns about AI-powered image-generation tools being trained on illegal child abuse material, highlighting the ethical and legal challenges associated with text-to-image or video models.
Numerous video demonstrations showcasing Sora’s capabilities have been circulating online, and the model is currently trending with over 173,000 posts.
OpenAI CEO Sam Altman invited users to make custom video-generation requests on a platform called X, sharing seven Sora-generated videos in response. These videos ranged from a duck riding a dragon to golden retrievers recording a podcast on a mountain top.
Many AI commentators, including Mckay Wrigley, were amazed by the videos generated by Sora.
Nvidia senior researcher Jim Fan stated on X that Sora is not just another “creative toy” like Dall-E 3. He described Sora as a “data-driven physics engine” that not only generates abstract videos but also determines the physics of objects in the scene itself.
Sora, OpenAI’s text-to-video model, impresses X but exhibits remaining vulnerabilities.
No Comments2 Mins Read
Related Posts
Add A Comment