AI Avatars Escape the Uncanny Valley

Table of Contents

What happens when AI doesn’t just generate content, but embodies it? AI has already mastered the ability to produce realistic photos, videos, and voices, passing the visual and auditory Turing Test. The next big leap is in AI avatars: combining a face with a voice to create a talking character.

Can’t you just generate an image of a face, animate it, and add a voiceover? Not quite. The challenge isn’t just nailing the lip sync — it’s making facial expressions and body language move in tandem. It would be weird if your mouth opened in surprise, but your cheeks and chin didn’t budge! And if a voice sounds excited but the corresponding face doesn’t react, the human-like illusion falls apart.

We’re starting to see real progress here. AI avatars are already being used in content creation, advertising, and corporate communication. Today’s versions are still mostly talking heads — functional, but limited — but we’ve seen some exciting developments in the last few months, and there’s clearly meaningful progress on the horizon.

In this post, we’ll break down what’s working now, what’s next, and the most impressive AI avatar products today, drawn from my hands-on testing of over 20 of them.

How has the research evolved?

AI avatars are a uniquely challenging research problem. To make a talking face, a model needs to learn realistic phoneme-to-viseme mapping: the relationship between speech sounds (phonemes) and their corresponding mouth movements (visemes). If this is “off,” the mouth and voice will look out of sync or even completely disconnected.

To make the issue even more complex, your mouth isn’t the only thing that moves when you talk. The rest of your face moves in conjunction, along with your upper body and sometimes your hands. And everyone has their own distinct style of speaking. Think about how you speak, compared to your favorite celebrity: even if you’re saying the same sentence, your mouths will move differently. If you tried to apply your lip sync to their face, it would look weird.

Over the last few years, this space has evolved significantly from a research perspective. I reviewed over 70 papers on AI talking heads since 2017 and saw a clear progression in model architecture — from CNNs and GANs, to 3D-based approaches like NeRFs and 3D Morphable Models, then to transformers and diffusion models, and most recently, to DiT (diffusion models based on the transformer architecture). The timeline below highlights the most cited papers from each year.

Auto-driven facial animation (CNN) Deep Video Portraits (GAN) Few-shot adversarial learning of realistic neural talking head models (GAN) MakeItTalk (GAN) Pose-Controllable Talking Face Generation (CNN) SadTalker (3DMM) CodeTalker (Transformer) EMO (Diffusion) OmniHuman (DiT) You Said That? (CNN) X2Face (CNN) wav2lip (GAN) AD-NeRF (NeRFs) FaceFormer (Transformer) DiffTalk (Diffusion) VASA (DiT) Timeline

Both the quality of generations and the capabilities of models have improved dramatically. Early approaches were limited. Imagine starting with a single photo of a person, masking the bottom half of their face, and generating new mouth movements based on target facial landmarks from audio input. These models were trained on a limited corpus of quality lip sync data, most of which was closely cropped at the face. More realistic results, like “lip-syncing Obama,” required many hours of video of the target person and were very limited in outputs.

Today’s models are much more flexible and powerful. They can generate half-body or even full body movement, realistic talking faces, and dynamic background motion — all in the same video! These newer models are trained more like traditional text-to-video models on much larger datasets, using a variety of techniques to maintain lip sync accuracy amid all the motion.

The first preview of this came with Bytedance’s OmniHuman-1 model, which was introduced in February (and was recently made available in Dreamina). The space is moving quickly — Hedra released Character-3 in March, which in our head-to-head testing is now best-in-class for most use cases. Hedra also works for non-human characters, like this talking Waymo, and enables users to prompt emotions and movement via text.

New use cases are also emerging around AI animation, spurred by trends like the Studio Ghibli movement. The below video came from a starting image frame and the audio track. Hedra generated the character’s lip sync and face + upper body movement. And check out the moving characters in the background!

Presenting The Office x Studio Ghibli pic.twitter.com/nHYrGc2uDs

— Justine Moore (@venturetwins) March 27, 2025

Real-world jobs for AI avatars

There are countless use cases for AI avatars — just imagine all the different places where you interact with a character or watch a video where someone is speaking. We’ve already seen usage across consumers, SMBs, and even enterprises.

This is an early market map. The space is evolving quickly, and the product distinctions are relatively rough. Many products theoretically could make avatars for most or all of these use cases, but we’ve found, in practice, that it’s hard to build the workflow and tune the model to excel at everything. Below, we’ve outlined examples for how each segment of the market is leveraging AI avatars.

Consumers: Character creation

Anyone can now create animated characters from a single image, which is a massive unlock for creativity. It’s hard to overstate how meaningful this is for everyday people who want to use AI to tell a story. One of the reasons early AI videos were criticized as “slides of images” is there were no talking characters (or speech only came in the form of voiceovers).

When you can make something talk, your content becomes much more interesting. And beyond traditional narrative video, you can create things like AI streamers, podcasters, and music videos. The videos linked here were all made on Hedra, which enables users to create dynamic, speaking characters from a single starting image and either an audio clip or a script.

If you’re starting with a video instead of an image, Sync can apply lip sync to make the character’s face fit your audio. And if you want to use real human performance to drive the movement of your character, tools like Runway Act-One and Viggle make it possible.

One of my favorite creators using AI to animate characters is Neural Viz, whose series, “The Monoverse,” imagines a post-human universe populated by Glurons. It’s only a matter of time before we see an explosion of AI-generated shows — or even just standalone influencers — now that the barrier to entry is so much lower.

Unanswered Oddities – Episode 1: Humans (youtube.com/@NeuralViz)

As avatars become easier to stream in real-time, we also expect to see consumer-facing companies implement them as a core part of their UI. Imagine learning a language with a live AI “coach” that is not just a disembodied voice, but a full character with a face and personality. Companies like Praktika are already doing this, and it will only get more natural over time.

SMBs: Lead generation

Ads have become one of the first killer use cases of AI avatars. Instead of hiring actors and a production crew, businesses can now have hyper-realistic AI characters promote their products. Companies like Creatify and Arcads make this seamless — just provide a product link and they generate an ad: writing the script, pulling B-roll and images, and “casting” an AI actor.

This has unlocked advertising for businesses that could never afford traditional ad production. It’s particularly popular among ecommerce companies, games, and consumer apps. Chances are, you’ve already seen AI-generated ads on YouTube or TikTok. Now B2B companies are exploring the tech as well, using AI avatars for content marketing or personalized outreach with tools like Yuzu Labs and Vidyard.

Many of these products combine an AI actor — whether a clone of a real person or a unique character — with other assets like product photos, video clips, and music. Users can control where these assets appear, or put it on “autopilot” and let the product pull together a video for you. You can either write the script yourself or use an AI-generated one.

Enterprises: Scaling content

Beyond marketing, enterprises are finding a range of applications for AI avatars. A few examples:

Learning and development. Most large companies produce training and educational videos for employees, covering everything from onboarding to compliance, product tutorials, and skill development. AI tools like Synthesia can automate this process, making content creation faster and more scalable. Some roles also require ongoing, video-based training — imagine a salesperson practicing their negotiation skills with an AI avatar from a product like Anam.

Localization. If a company has customers or employees in different countries, it may want to localize content into different languages or switch out cultural references. AI actors make it fast and easy to personalize your videos for different geographies. Thanks to AI voice translation from companies like ElevenLabs, businesses can generate the same video in dozens of languages, with natural-sounding voices.

Executive presence. AI avatars let executives scale their presence by cloning their persona to create personalized content for employees or customers. Instead of filming every product announcement or a “thank you” message, companies can generate a realistic AI twin of their CEO or product lead. We’re also seeing companies like Delphi and Cicero make it easy for thought leaders to interact with and answer questions from people they’d never normally be able to meet 1:1.

What are the ingredients of an AI avatar?

Creating a believable AI avatar is a challenge, with each element of realism presenting its own technical hurdles. It’s not just about avoiding the uncanny valley, it’s about solving fundamental problems in animation, speech synthesis, and real-time rendering. Here’s a breakdown of what’s required, why it’s so hard to get right, and where we’re seeing progress:

Face – Whether you’re cloning a person or creating a new character, you need a face that stays consistent between frames and moves realistically while talking. Context-aware expressiveness remains a challenge (e.g. an avatar yawning while saying “I’m tired”).
Voice – The voice needs to sound real and match the character; a teenage girl’s face shouldn’t have an older woman’s voice. Most of the AI avatar companies we’ve met use ElevenLabs, which has an extensive voice library and allows you to clone your own.
Lip sync – Getting quality lip sync is tricky. Entire companies, like Sync, are dedicated to solving this problem. Other models like MoCha (from Meta) and OmniHuman are trained on larger datasets and use various techniques to strongly condition face generation on the accompanying audio.
Body – Your avatar can’t just be a floating head! Newer models enable avatars with full bodies that can move, but we’re still in early days in terms of both scaling them and delivering them to users.
Background – Avatars don’t exist in a vacuum. The lighting, depth, and interactions in their surrounding environment need to match the scene. Ideally, avatars will even be able to touch and engage with things in their environment, like picking up a product.

If you want your avatar to engage in real-time conversations — like joining a Zoom meeting — there are a few other things you need to add:

Brain – Your avatar needs to be able to “think.” Products that enable conversation today typically enable you to upload or connect to a knowledge base. In the future, more complex versions of this will hopefully include more memory and personality. Avatars should be able to remember past conversations with you and have their own “flair.”
Streaming – It’s not easy to stream all of this with minimal latency. Products like LiveKit and Agora are making progress here, but it’s hard to make all these models work while minimizing latency. We’ve seen a few products do this well — like Tolan, an AI alien companion with a voice and face — but there’s still work to be done.

What would we like to see?

There’s still so much to build and improve in this space. A few areas that are top-of-mind:

Character consistency and transformation

Historically, each AI avatar had one, fixed “look.” Their outfit, pose, and environment were static. Some products are starting to offer more options. For example, this character from HeyGen, Raul, has 20 looks! But it would be great to more easily transform a character however you want.

Better facial movement and expressiveness

Faces have long been the weak link of AI avatars, often looking robotic. That’s starting to change with products like Captions’ new Mirage, which delivers a more natural look and broader range of expressions. We’d love to see AI avatars that understand the emotional context of a script and react appropriately, like looking scared if the character is fleeing from a monster.

Body movement

Today, the vast majority of avatars have little movement below the face — even basic things like hand gestures. Gesture control has been fairly programmatic: for example, Argil allows you to select different types of body language for each segment of your video. We’re excited to see more natural, inferred motion in the future.

Interacting with the “real world”

Right now, AI avatars can’t interact with their surroundings. An attainable near-term goal may be enabling them to hold products in ads. Topview has already made progress (see the below video for their process and outcome), and we’re excited to see what’s to come as models improve.

More real-time applications

To name a few potential use cases: doing a video call with an AI doctor, browsing curated products with an AI sales assistant, or FaceTiming with a character from your favorite TV show. The latency and reliability aren’t quite human-level, but they’re getting close. Check out a demo of me chatting with Tavus‘ latest model.

Where are we headed?

One of our main learnings from investing in both foundation model companies and AI applications over the past few years? It’s nearly impossible to predict with any degree of certainty where a given space is headed. However, it feels safe to say that the application layer is poised for rapid growth now that the underlying model quality finally feels good enough to generate AI talking heads that aren’t painful to watch.

We expect this space will give rise to multiple billion-dollar companies, with products segmented by use case and target customer. For example, an executive looking for an AI clone to film videos for customers will need (and be willing to pay) for a higher level of quality and realism than a fan making a quick clip of their favorite anime character to send to friends.

Workflow is also important. If you’re generating ads with AI influencers, you’ll want to use a platform that can automatically pull in product details, write scripts, add B-roll and product photos, push the videos to your social channels, and measure results. On the other hand, if you’re trying to tell a story using AI characters, you’ll prioritize tools that enable you to save and re-use characters and scenes, and easily splice together different types of clips.

We can't wait to see what emerges here. If you're building in this space, I'd love to chat. Reach out to jmoore@a16z.com or venturetwins on X.

Contributor

Justine Moore is a partner on the consumer tech team at Andreessen Horowitz, where she focuses on AI companies.
- Follow
- X
- Linkedin

More From this Contributor

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

RECOMMENDED FOR YOU

The Top 100 Gen AI Consumer Apps - 4th Edition Olivia Moore and Daisy Zhao Read More
Elevating AI + Art Justine Moore, Anish Acharya, Olivia Moore, Bryan Kim, and Zach Cohen Read More
From Prompt to Product: The Rise of AI-Powered Web App Builders Justine Moore, Yoko Li, Gabriel Vasquez, Marco Mascorro, and Bryan Kim Read More
AI Voice Agents: 2025 Update Olivia Moore Read More
Shopping in 'God Mode' with AI Daisy Zhao and Bryan Kim Read More

go to top