AI + a16z

Building the Next Generation of Conversational AI

Ankit Kumar and Anjney Midha

Posted March 14, 2025

In this episode of AI + a16z, Sesame Cofounder and CTO Ankit Kumar joins a16z general partner Anjney Midha for a deep dive into the research and engineering behind their voice technology. They discuss the technical challenges of real-time speech generation, the trade-offs in balancing personality with efficiency, and why the team is open-sourcing key components of their model. Ankit breaks down the complexities of multimodal AI, full-duplex conversation modeling, and the computational optimizations that enable low-latency interactions.

They also explore the evolution of natural language as a user interface and its potential to redefine human-computer interaction. Plus, they take audience questions on everything from scaling laws in speech synthesis to the role of in-context learning in making AI voices more expressive.

More About This Podcast

Artificial intelligence is changing everything from art to enterprise IT, and a16z is watching all of it with a close eye. This podcast features discussions with leading AI engineers, founders, and experts, as well as our general partners, about where the technology and industry are heading.

Learn More