Consumer

AI Avatars Escape the Uncanny Valley

Justine Moore Posted April 8, 2025

AI Avatars Escape the Uncanny Valley Table of Contents

What happens when AI doesn’t just generate content, but embodies it? AI has already mastered the ability to produce realistic photos, videos, and voices, passing the visual and auditory Turing Test. The next big leap is in AI avatars: combining a face with a voice to create a talking character.

Can’t you just generate an image of a face, animate it, and add a voiceover? Not quite. The challenge isn’t just nailing the lip sync — it’s making facial expressions and body language move in tandem. It would be weird if your mouth opened in surprise, but your cheeks and chin didn’t budge! And if a voice sounds excited but the corresponding face doesn’t react, the human-like illusion falls apart.

We’re starting to see real progress here. AI avatars are already being used in content creation, advertising, and corporate communication. Today’s versions are still mostly talking heads — functional, but limited — but we’ve seen some exciting developments in the last few months, and there’s clearly meaningful progress on the horizon.

In this post, we’ll break down what’s working now, what’s next, and the most impressive AI avatar products today, drawn from my hands-on testing of over 20 of them.

I tested 20+ products to make AI characters. As someone who has been trialing these tools for years, I’m blown away by what we can create now.

How has the research evolved?

AI avatars are a uniquely challenging research problem. To make a talking face, a model needs to learn realistic phoneme-to-viseme mapping: the relationship between speech sounds (phonemes) and their corresponding mouth movements (visemes). If this is “off,” the mouth and voice will look out of sync or even completely disconnected.

To make the issue even more complex, your mouth isn’t the only thing that moves when you talk. The rest of your face moves in conjunction, along with your upper body and sometimes your hands. And everyone has their own distinct style of speaking. Think about how you speak, compared to your favorite celebrity: even if you’re saying the same sentence, your mouths will move differently. If you tried to apply your lip sync to their face, it would look weird.

Over the last few years, this space has evolved significantly from a research perspective. I reviewed over 70 papers on AI talking heads since 2017 and saw a clear progression in model architecture — from CNNs and GANs, to 3D-based approaches like NeRFs and 3D Morphable Models, then to transformers and diffusion models, and most recently, to DiT (diffusion models based on the transformer architecture). The timeline below highlights the most cited papers from each year.

Both the quality of generations and the capabilities of models have improved dramatically. Early approaches were limited. Imagine starting with a single photo of a person, masking the bottom half of their face, and generating new mouth movements based on target facial landmarks from audio input. These models were trained on a limited corpus of quality lip sync data, most of which was closely cropped at the face. More realistic results, like “​lip-syncing Obama​,” required many hours of video of the target person and were very limited in outputs.

Today’s models are much more flexible and powerful. They can generate half-body or even full body movement, realistic talking faces, and dynamic background motion — all in the same video! These newer models are trained more like traditional text-to-video models on much larger datasets, using a variety of techniques to maintain lip sync accuracy amid all the motion. 

The first preview of this came with Bytedance’s OmniHuman-1 model, which was introduced in February (and was recently made available in Dreamina). The space is moving quickly — Hedra​ released Character-3 in March, which in our head-to-head testing is now best-in-class for most use cases. Hedra also works for non-human characters, like this talking Waymo, and enables users to prompt emotions and movement via text.

New use cases are also emerging around AI animation, spurred by trends like the Studio Ghibli movement. The below video came from a starting image frame and the audio track. Hedra generated the character’s lip sync and face + upper body movement. And check out the moving characters in the background!

Real-world jobs for AI avatars

There are countless use cases for AI avatars — just imagine all the different places where you interact with a character or watch a video where someone is speaking. We’ve already seen usage across consumers, SMBs, and even enterprises.

This is an early market map. The space is evolving quickly, and the product distinctions are relatively rough. Many products theoretically could make avatars for most or all of these use cases, but we’ve found, in practice, that it’s hard to build the workflow and tune the model to excel at everything. Below, we’ve outlined examples for how each segment of the market is leveraging AI avatars.

Consumers: Character creation

Anyone can now create animated characters from a single image, which is a massive unlock for creativity. It’s hard to overstate how meaningful this is for everyday people who want to use AI to tell a story. One of the reasons early AI videos were criticized as “slides of images” is there were no talking characters (or speech only came in the form of voiceovers).

When you can make something talk, your content becomes much more interesting. And beyond traditional narrative video, you can create things like AI streamers, podcasters, and music videos. The videos linked here were all made onHedra, which enables users to create dynamic, speaking characters from a single starting image and either an audio clip or a script.

If you’re starting with a video instead of an image, Sync​ can apply lip sync to make the character’s face fit your audio. And if you want to use real human performance to drive the movement of your character, tools likeRunway Act-One andViggle make it possible.

One of my favorite creators using AI to animate characters isNeural Viz, whose series, “The Monoverse,” imagines a post-human universe populated by Glurons. It’s only a matter of time before we see an explosion of AI-generated shows — or even just standalone influencers — now that the barrier to entry is so much lower.

Unanswered Oddities – Episode 1: Humans (youtube.com/@NeuralViz)

As avatars become easier to stream in real-time, we also expect to see consumer-facing companies implement them as a core part of their UI. Imagine learning a language with a live AI “coach” that is not just a disembodied voice, but a full character with a face and personality. Companies like Praktika are already doing this, and it will only get more natural over time. 

SMBs: Lead generation

Ads have become one of the first killer use cases of AI avatars. Instead of hiring actors and a production crew, businesses can now have hyper-realistic AI characters promote their products. Companies like Creatify and Arcads make this seamless — just provide a product link and they generate an ad: writing the script, pulling B-roll and images, and “casting” an AI actor.

This has unlocked advertising for businesses that could never afford traditional ad production. It’s particularly popular among ecommerce companies, games, and consumer apps. Chances are, you’ve already seen AI-generated ads on YouTube or TikTok. Now B2B companies are exploring the tech as well, using AI avatars for content marketing or personalized outreach with tools like Yuzu Labs and Vidyard.

Many of these products combine an AI actor — whether a clone of a real person or a unique character — with other assets like product photos, video clips, and music. Users can control where these assets appear, or put it on “autopilot” and let the product pull together a video for you. You can either write the script yourself or use an AI-generated one.

Enterprises: Scaling content

Beyond marketing, enterprises are finding a range of applications for AI avatars. A few examples:

Learning and development. Most large companies produce training and educational videos for employees, covering everything from onboarding to compliance, product tutorials, and skill development. AI tools like Synthesia can automate this process, making content creation faster and more scalable. Some roles also require ongoing, video-based training — imagine a salesperson practicing their negotiation skills with an AI avatar from a product like Anam.

Localization. If a company has customers or employees in different countries, it may want to localize content into different languages or switch out cultural references. AI actors make it fast and easy to personalize your videos for different geographies. Thanks to ​AI voice translation​ from companies like ElevenLabs, businesses can generate the same video in dozens of languages, with natural-sounding voices. 

Executive presence. AI avatars let executives scale their presence by cloning their persona to create personalized content for employees or customers. Instead of filming every product announcement or a “thank you” message, companies can generate a realistic AI twin of their CEO or product lead. We’re also seeing companies like Delphi and Cicero make it easy for thought leaders to interact with and answer questions from people they’d never normally be able to meet 1:1.

What are the ingredients of an AI avatar? 

Creating a believable AI avatar is a challenge, with each element of realism presenting its own technical hurdles. It’s not just about avoiding the uncanny valley, it’s about solving fundamental problems in animation, speech synthesis, and real-time rendering. Here’s a breakdown of what’s required, why it’s so hard to get right, and where we’re seeing progress:

  • Face – Whether you’re cloning a person or creating a new character, you need a face that stays consistent between frames and moves realistically while talking. Context-aware expressiveness remains a challenge (e.g. an avatar yawning while saying “I’m tired”).
  • Voice – The voice needs to sound real and match the character; a teenage girl’s face shouldn’t have an older woman’s voice. Most of the AI avatar companies we’ve met useElevenLabs, which has an extensive voice library and allows you to clone your own.
  • Lip sync – Getting quality lip sync is tricky. Entire companies, likeSync, are dedicated to solving this problem. Other models like MoCha (from Meta) and OmniHuman are trained on larger datasets and use various techniques to strongly condition face generation on the accompanying audio. 
  • Body – Your avatar can’t just be a floating head! Newer models enable avatars with full bodies that can move, but we’re still in early days in terms of both scaling them and delivering them to users.
  • Background – Avatars don’t exist in a vacuum. The lighting, depth, and interactions in their surrounding environment need to match the scene. Ideally, avatars will even be able to touch and engage with things in their environment, like picking up a product.

If you want your avatar to engage in real-time conversations — like joining a Zoom meeting — there are a few other things you need to add:

  • Brain – Your avatar needs to be able to “think.” Products that enable conversation today typically enable you to upload or connect to a knowledge base. In the future, more complex versions of this will hopefully include more memory and personality. Avatars should be able to remember past conversations with you and have their own “flair.”
  • Streaming – It’s not easy to stream all of this with minimal latency. Products likeLiveKit andAgora are making progress here, but it’s hard to make all these models work while minimizing latency. We’ve seen a few products do this well — likeTolan, an AI alien companion with a voice and face — but there’s still work to be done.

What would we like to see?

There’s still so much to build and improve in this space. A few areas that are top-of-mind:

Character consistency and transformation

Historically, each AI avatar had one, fixed “look.” Their outfit, pose, and environment were static. Some products are starting to offer more options. For example, this character from HeyGen, Raul, has 20 looks! But it would be great to more easily transform a character however you want.

Better facial movement and expressiveness

Faces have long been the weak link of AI avatars, often looking robotic. That’s starting to change with products like Captions’ new Mirage, which delivers a more natural look and broader range of expressions. We’d love to see AI avatars that understand the emotional context of a script and react appropriately, like looking scared if the character is fleeing from a monster.

Body movement

Today, the vast majority of avatars have little movement below the face — even basic things like hand gestures. Gesture control has been fairly programmatic: for example, Argil allows you to select different types of body language for each segment of your video. We’re excited to see more natural, inferred motion in the future. 

Interacting with the “real world”

Right now, AI avatars can’t interact with their surroundings. An attainable near-term goal may be enabling them to hold products in ads. Topview has already made progress (see the below video for their process and outcome), and we’re excited to see what’s to come as models improve.

More real-time applications

To name a few potential use cases: doing a video call with an AI doctor, browsing curated products with an AI sales assistant, or FaceTiming with a character from your favorite TV show. The latency and reliability aren’t quite human-level, but they’re getting close. Check out a demo of me chatting with Tavus‘ latest model.

Where are we headed?

One of our main learnings from investing in both foundation model companies and AI applications over the past few years? It’s nearly impossible to predict with any degree of certainty where a given space is headed. However, it feels safe to say that the application layer is poised for rapid growth now that the underlying model quality finally feels good enough to generate AI talking heads that aren’t painful to watch.

We expect this space will give rise to multiple billion-dollar companies, with products segmented by use case and target customer. For example, an executive looking for an AI clone to film videos for customers will need (and be willing to pay) for a higher level of quality and realism than a fan making a quick clip of their favorite anime character to send to friends.

Workflow is also important. If you’re generating ads with AI influencers, you’ll want to use a platform that can automatically pull in product details, write scripts, add B-roll and product photos, push the videos to your social channels, and measure results. On the other hand, if you’re trying to tell a story using AI characters, you’ll prioritize tools that enable you to save and re-use characters and scenes, and easily splice together different types of clips.

We can’t wait to see what emerges here. If you’re building in this space, I’d love to chat. Reach out to jmoore@a16z.com or venturetwins​ on X.

Want More a16z Consumer?

Analysis and news covering the latest trends reshaping B2C and consumer tech.

Learn More
Recommended For You
Consumer

The Top 100 Gen AI Consumer Apps – 4th Edition

Olivia Moore and Daisy Zhao

Elevating AI + Art

Justine Moore, Anish Acharya, Olivia Moore, Bryan Kim, and Zach Cohen
Consumer

From Prompt to Product: The Rise of AI-Powered Web App Builders

Justine Moore, Yoko Li, Gabriel Vasquez, Marco Mascorro, and Bryan Kim
Consumer

AI Voice Agents: 2025 Update

Olivia Moore
Consumer

Shopping in ‘God Mode’ with AI

Daisy Zhao and Bryan Kim

Want More Consumer?

Analysis and news covering the latest trends reshaping B2C and consumer tech.

Sign Up On Substack

Views expressed in “posts” (including podcasts, videos, and social media) are those of the individual a16z personnel quoted therein and are not the views of a16z Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell — or a solicitation of an offer to buy — any securities, and may not be used or relied upon in evaluating the merits of any investment.

The contents in here — and available on any associated distribution platforms and any public a16z online social media accounts, platforms, and sites (collectively, “content distribution outlets”) — should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here or on a16z content distribution outlets are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, posts may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein. All content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website — or on associated content distribution outlets — be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles — which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters.

There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available here: https://a16z.com/investments/. Past results of a16z’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Excluded from this list are investments (and certain publicly traded cryptocurrencies/ digital assets) for which the issuer has not provided permission for a16z to disclose publicly. As for its investments in any cryptocurrency or token project, a16z is acting in its own financial interest, not necessarily in the interests of other token holders. a16z has no special role in any of these projects or power over their management. a16z does not undertake to continue to have any involvement in these projects other than as an investor and token holder, and other token holders should not expect that it will or rely on it to have any particular involvement.

With respect to funds managed by a16z that are registered in Japan, a16z will provide to any member of the Japanese public a copy of such documents as are required to be made publicly available pursuant to Article 63 of the Financial Instruments and Exchange Act of Japan. Please contact compliance@a16z.com to request such documents.

For other site terms of use, please go here. Additional important information about a16z, including our Form ADV Part 2A Brochure, is available at the SEC’s website: http://www.adviserinfo.sec.gov.