There is no God Tier video model: But there is something better

Last year, it seemed like every week a new frontier video model would one-up its peers and push our expectations. We learned to simply expect relentless improvement on measurable benchmarks: longer video output, sustained stories, better physics, character consistency, movement cohesion, and more.

But this year, if you care about the benchmarks a lot, you might be getting a little restless. We pretty much expect all video models to generate 10-15 seconds with synced audio and speech, and an all-around sense of realism; which is a great achievement, but not the runaway model progress to which we’d been conditioned. Instead, we’re entering a more exciting time: the product era of video models, where “abundance” isn’t about benchmark performance – it’s about variety.

We’re beginning to see models specialize across specific dimensions: there is no “God Model” that’s great at everything.
Startups are finding new opportunities across two main dimensions: video models that excel at one key thing (Physics! Anime! Multiple shots!) and products that abstract away arduous workflows.
Hot take: even if video model progress stopped entirely, founders would be playing years of catch-up building products around current model capabilities.
This is great for startups: it creates space for verticals and wedges that can become their own massive companies.

A brief history of models

For the past few years, frontier diffusion model labs have consistently released models that outperform prior generations on major benchmarks. This progress inculcated an assumption among researchers, creators, and (yes) VC’s like myself: a “god-tier” model that is great at everything would eventually emerge, and become the default provider for a multitude of different video use-cases. But this year, the assumption was challenged: Sora 2 was released last month, and appeared below Veo 3 on benchmarks like LMarena. There’s a sense that progress might be slowing on the diffusion model level, and the concept of “SOTA”, at least when it comes to video models, may not actually exist.

Of course, this assumption (and subsequent confrontation with reality) is not unique to diffusion models. Frontier lab LLMs also followed a pattern of step-function improvements from 2023-2025, and then saw performance begin to level off across a variety of benchmarks. When this happened with LLMs, we saw major foundation labs begin to specialize in specific verticals, and the overall AI product layer take off; in fact, we wrote about this trend earlier this year.

Upon reflection it does make sense that progress on video models is beginning to plateau, at least when it comes to public benchmarks. Models have come an extremely long way in terms of realism and aesthetic quality, and now generate outputs that are so realistic that at this point, they might be hitting an asymptote. Once you convincingly look like real life, how much more “real” can you get? A good analogy here might be still-life painting in the 17th and 18th century: at a certain point, the old masters got so good at painting realistic portraits and pastoral scenes that preferring one artist over another came down to a simple matter of aesthetic taste, rather than any objective measure of what looked more “real”.

So what happens when realism stops being the differentiator for video models? We get more variety and specialization. Welcome to the abundance era.

The Models are Specializing

Before we explore the increased specialization we’re seeing at the model layer, it’s worth taking a quick stroll (or more appropriately, scroll) down memory lane. Back in early 2024, I wrote a piece taking stock of the state of video model outputs. At the time, it was a struggle to generate videos longer than 3-4 seconds. Physics was a joke: people might randomly melt into the ground; basketballs might rebound off the backboard, ricochet toward the ground, then miraculously end up in the hoop in the final frame. At the time, I remarked that it would be awhile before we saw AI generate a Pixar-level short film. Google hadn’t yet released Veo, or any public model for that matter; they would just tease the timeline with research papers.

How quickly things change. Google now dominates LMArena and other leaderboards with their Veo models. OpenAI is producing a $30 million animated feature with a team of 30 people on a timeline of nine months (admittedly they’re not one-shotting the feature from a single prompt, but this is still staggering progress!). As stated in the introduction, we can now expect longer video outputs, and higher degrees of physics realism: the basketball rebounds off the backboard and hits the ground now.

But even as models collectively are getting better, we’re beginning to see them specialize. Why? Because no one model is good at everything. Some teams are optimizing for price and speed of outputs. Others are going all-in on post-training, with apparent focuses on specific verticals. Here are a couple of examples of what I’m seeing:

Veo 3: has the strongest handle on physics, complex movement, and audio & speech synced with video

Sora 2: will “direct” for you from a short prompt, writing the script (it’s often funny!) and creating multiple shots.

Wan – a solid open source player with an ecosystem of LoRAs for specific styles, motions, or effects.

Grok: fast, inexpensive, and particularly great at anime and animation

Seedance Pro: – can product multi-shot scenes in a single generation

Hedra: great at long-form clips of talking characters

Comparing outputs from Sora 2 and Veo 3 is a great illustration of how models are beginning to specialize. Both are exceptional video models, but I’d argue they’re good at very different things. Sora excels at inserting yourself (or friends) into a scene, and crafting a story (or meme, or joke) from a short prompt, like the Stephen Hawking sports clips or Pikachu in famous movies. It’s great for consumers and meme makers.

However, it’s not as good as Veo 3 at syncing video with audio. You’ll often get multiple fast clips where the audio lags behind or the wrong person is talking. And in my opinion, it’s less good at understanding physics or any kind of more complex motion. Veo 3, meanwhile, has less of a sense of humor and needs more guidance, but I’d argue that it’s a more powerful and controllable model for prosumer and professional creatives.

This trend of model specialization is also positive for players up and down the stack. AI video cloud providers like Fal and Replicate now host dozens of models for users seeking to access these various vertical use-cases. Editing suites like Krea give users a central hub that allows them to interact with multiple models and build entire workflows around them.

Still, I expect to see the extremely well-capitalized labs trying to achieve “god mode” when it comes to video, and we hope and expect to see continued performance improvements! But there’s also a ton of opportunity in being the best for a certain use case or in a certain vertical.

Products for the Masses

If you follow me on X, you’ve probably picked up on the fact that I love experimenting with multiple video and image models to create extremely bespoke outputs. This involves a ton of cross-pollination between different model providers and editing tools. For example, creating a custom furniture staging video involves leveraging Ideogram, nano-banana, and Veo3, and adding product swag to an existing video requires nano banana, Hedra, and editing suites like Krea and Kapwing. Not everyone has access to all of these tools (or the level of masochism) required to generate their desired outputs. We need better end-to-end products.

There’s still a ton of “room to run” when it comes to catching up to model progress on the product side. Creators are hacking together complex workflows across multiple products to do things that could feasibly be done by the models – e.g. getting consistent characters across generations, extending a scene by taking the end frame of the last clip, controlling camera movement via start and end frames using an image edit model, or collaging storyboards.

The good news is that some labs are starting to address this product gap. Runway released a suite of apps that enable creators to edit camera angles, do next-shot video generation, perform style transfers from one clip to the next, change the weather, and add or remove items from clips. Sora Storyboard enables users to specify precise moment-to-moment action in a video sequence. And Veo 3.1, which was released earlier this month, consists almost entirely of product updates around audio and visual control, as opposed to model-level improvements.

Some people like to say that if LLM progress stalled tomorrow, entrepreneurs would still have years of catchup to play building useful products around the models’ capabilities. The same is true of video models too: we’re only just beginning to see end-to-end products being built around these models, with a lot of opportunities that haven’t been explored yet. In the future I’m excited to see smaller, more specialized models. There will likely be products that guide these models towards the best generations for specific use cases or industries – like home-staging, marketing, or animation. And finally, we still need creative suites to bring all the modalities together and make it easier to generate and edit in one place: video, dubbed audio, music, and more.

If you’re working on new verticals in video models, or end-to-end products for AI video orchestration, reach out! I’d love to hear from you. You can find me on X at @venturetwins, or email me at jmoore@a16z.com.

Contributor

Justine Moore is a partner on the investing team at Andreessen Horowitz, where she focuses on AI — both foundation models and applications.
- Follow
- X
- Linkedin

More From this Contributor

Views expressed in “posts” (including podcasts, videos, and social media) are those of the individual a16z personnel quoted therein and are not the views of a16z Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell — or a solicitation of an offer to buy — any securities, and may not be used or relied upon in evaluating the merits of any investment.

The contents in here — and available on any associated distribution platforms and any public a16z online social media accounts, platforms, and sites (collectively, “content distribution outlets”) — should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here or on a16z content distribution outlets are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, posts may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein. All content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website — or on associated content distribution outlets — be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles — which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters.

There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available here: https://a16z.com/investments/. Past results of a16z’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Excluded from this list are investments (and certain publicly traded cryptocurrencies/ digital assets) for which the issuer has not provided permission for a16z to disclose publicly. As for its investments in any cryptocurrency or token project, a16z is acting in its own financial interest, not necessarily in the interests of other token holders. a16z has no special role in any of these projects or power over their management. a16z does not undertake to continue to have any involvement in these projects other than as an investor and token holder, and other token holders should not expect that it will or rely on it to have any particular involvement.

With respect to funds managed by a16z that are registered in Japan, a16z will provide to any member of the Japanese public a copy of such documents as are required to be made publicly available pursuant to Article 63 of the Financial Instruments and Exchange Act of Japan. Please contact compliance@a16z.com to request such documents.

For other site terms of use, please go here. Additional important information about a16z, including our Form ADV Part 2A Brochure, is available at the SEC’s website: http://www.adviserinfo.sec.gov.

RECOMMENDED FOR YOU

go to top

There is no God Tier video model: But there is something better

A brief history of models

The Models are Specializing

Products for the Masses

Stay up to date on the latest from a16z Infra team

Stay up to date on the latest from a16z Infra team