Just last week, OpenAI CEO Sam Altman declared that video is key to achieving AGIwith its launch of Sora. Now, Google strikes back with Veo 2 and Imagen 3, its latest generative AI models. Though not publicly available yet, product lead Logan Kilpatrick revealed they will hit API by early next year.
The model handles complex elements like reflections and shadows, producing clearer, sharper footage. It also includes SynthID watermarking for safety.
Google’s internal testing indicates that Veo outperforms competitors (such as China’s KlingMeta’s Moviegenand OpenAI’s Sora) both in terms of quality and prompt adherence.
Justine Moore, a partner at a16z, was among the early testers and noted that the model excelled at creating nature and animal-related clips, as well as capturing detailed movement.
The model builds on its predecessor, first showcased at Google I/O in May, and has since been integrated into YouTube and Google Cloud.
Google’s Veo 2 is presented as being more advanced in terms of cinematic understanding. “Veo 2 delivers lifelike visuals with enhanced realism, reducing artifacts and improving detail. Its motion simulation accurately replicates simple and complex movements using physics,” said Google DeepMind’s Tom Hume on X.
“It’s not perfect, but it’s a significant improvement over current state-of-the-art models, as our benchmarks demonstrate,” said Shlomi Fruchter, Veo co-lead at Google DeepMind, hinting how the model still struggles with complex physics.
Wharton’s Ethan Mollick saidin comparison to other models, “Sora offers a lot more control options and longer clips, so it is hard to compare, but I will say that I think the dominance of the Chinese models is over.” Interestingly, according to the blog, Google claims Kling is its biggest competitor.
Does Veo Pass the Physics Test?
Google’s access to YouTube gives it a clear advantage over OpenAI for training these models to maintain the laws of physics.
Veo 2’s true test lies in generating a gymnast’s routine, showcasing its improved grasp of human movement while accurately modelling complex motions. In a viral tweet shared by VC Deedy Das, Sora failed on this prompt.
Veo 2 supports 4K resolution and can produce videos longer than two minutes, although it’s currently restricted to 720p and eight seconds on its experimental platform. Notably, it outperforms Sora with four times the resolution and six times the video duration.
This release follows another significant development by DeepMind in the GenAI space: the launch of Genie 2a foundation world model capable of generating interactive 3D environments from simple text prompts.
World models like Genie 2 provide a vast and diverse set of environments that are critical for the training of embodied AI agents. These environments act as test beds for agents, enabling them to generalise across various domains and prepare them for more complex, real-world tasks. This research accelerates the pace of DeepMind’s AGI vision.
This solidifies 2025 as the year of advanced world models, with Google at its helm.
Road to AGI
Google’s 2014 acquisition of DeepMind for $400-650 million is touted as one of the smartest business decisions in history. To this, Tesla chief Elon Musk quips: “You have that backwards. DeepMind acquired Google,” highlighting how AI is essential to Google’s relevance today, especially in the race to AGI.
In a previous interview with AIMAI sceptic Gary Marcus said that DeepMind is likely on a better path towards AGI compared to its competitors.
Google’s announcements this month have played a key role in challenging OpenAI during its 12-day ‘shipmas.’ However, as more companies introduce capabilities, often at lower price points, OpenAI’s $200 pricing might face increasing scrutiny.
There is no stopping Google: The scale at which the tech giant is shipping is synonymous with its early days, like a startup. Releases include Gemini 2, Willow, GenCastalong with updates to NotebookLMto name a few.