For decades, computing interfaces have evolved to become more natural and intuitive. In the 1980s, we used keyboards to interact with terminals. In the 1990s, we moved to mice to interact with PC GUIs. And then, in the 2010s, smartphones introduced touchscreens to interact with mobile GUIs.
Each generation has brought us closer to seamless interaction between humans and machines. But more than a decade since the iPhone launch, we still rely on screens for nearly everything. What comes next?
The answer may lie in speech, which is the natural modality for human-to-human interaction, but has largely remained too primitive for human-to-computer interaction beyond basic experiences like, “Alexa, turn off the lights.”
However, AI has made giant strides over the past few years. Combined with the right hardware, advances in speech and language (both understanding and generation), could unlock a truly natural computing interface. This is where Sesame enters the picture.
Sesame is built around the simple, but non-obvious, idea that the answer isn’t in the screens of AR glasses — it’s in the audio. To date, the emotional flatness of AI audio has been exhausting and unnatural. But if you remove the visual display from AR glasses and instead focus on an amazing audio-first AI system, you can create a computing experience that feels seamless and intuitive.
As a starting point, the team has trained a Conversational Speech Model (CSM) that takes a novel approach to speech modeling, which you can read more about here. It’s not past the uncanny valley yet, but it’s close. You can also try the research preview and talk to Maya or Miles, its first AI companions, here.
As it happens, I’ve known the founding team for a while. There are few people in the world with the practical depth to truly understand the opportunity, and Sesame cofounders Brendan Iribe and Ankit Kumar are two of them. As the co-founder and CEO of Oculus, (and an angel investor in my first startup, Ubiquity6) Brendan had built and sold a hardware platform for billions of dollars years before VR was mainstream. Ankit, my Ubiquity6 co-founder and Discord’s Clyde AI engineering lead, has firsthand experience productionizing language and speech models at scale. After months of our dining table working sessions and cross-country flights, Sesame was born.
We’re thrilled to have led the company’s Series A and to help it grow into the next great consumer computing platform. And if you’re interested in redefining how we interact with computers, Sesame is hiring across AI, hardware, product, and design.