Where you build is who you are: the ElevenLabs story

Jennifer Li

San Francisco is the hub of AI. However, not all model companies should be built in the same ten square miles atop the peninsula. ElevenLabs is one of my favorite examples of why having a unique origin story (born out of frustration with Polish single-narrator voiceovers), a unique set of geographical constraints and opportunities, and a unique company culture gives you a differentiated advantage from day one. It’s also why ElevenLabs – because they’re a little different from every other AI company – can help illuminate many of the classic challenges these companies face: like how to earn trust from users and creatives; or how to scale a team without losing what makes you unique. I hope you enjoy these lessons from an unusual and special company.

The outside view inside the challenge

When I first met Mati and Piotr, ElevenLabs was two people in London pursuing a deceptively simple goal: give machines a voice as natural as our own. Three years later, they’ve become one of the fastest-scaling AI companies, building a full creative ecosystem across voice, music, and now conversational agents. Realistic speech and voice have been goals in the AI industry for decades, so what makes ElevenLabs different? Is it just about mastering voice, or are other things at play here too?

Over the course of my conversation with ElevenLabs founder Mati Staniszewski at Runtime, something became clear: in addition to their voice models, the team has also mastered two other domains: space and time. They’ve built out a truly global team (which you need in a field like voice) and have an intensely strategic approach to everything from research and product timelines, to balancing feedback loops from consumer and enterprise customers alike:

  • Being in Europe is an edge for building a voice company and expanding globally. There are unique pains and opportunities that only non-native English speakers can appreciate.
  • At an AI lab, research comes first. But you can’t always expect research progress to align with customer demands. So be prepared to deliver: at ElevenLabs, if research can’t solve a problem in 3 months, the team finds a way to build a product that bridges the gap in a shorter timeframe.
  • Work with the creative industry instead of around it. Creative people are curious about AI, and want to understand where it can aid and accelerate them. Ask people questions: what parts of the production process benefit from AI? Where is it actually helpful?
  • Having a product that reaches both consumer and enterprise users means learning how to live on faster and slower timelines. The product feedback loop with consumers is rapid; deployment times with enterprises can be longer. You need to build an organization that’s comfortable with both working styles.

Europe as an edge, not a handicap

ElevenLabs is international and distributed, and particularly attuned to all of the ways in which voice conveys meaning, and the ways that meaning can get (literally and figuratively) lost in translation. Tone, inflection, and other vocal nuances not only carry much of the real “meaning” in a spoken conversation; that meaning can also vary tremendously across cultures. You won’t really master this as a product company unless your team is international by default, and starting with different assumptions than a US team entirely based in San Francisco.

If you live in the United States or another English-speaking country, you might take it for granted that movies, podcasts, audiobooks, and pretty much all voice-based content you encounter has a variety of emotions, intonations, voices, styles, and even accents contained within the English language. It’s the lattice of those things, combined with words, that create meaning. If you encounter that same content dubbed in a different language, you may not enjoy that same variety.

ElevenLabs wouldn’t exist if they weren’t based in Europe. In Poland, where much of the original team is from, all foreign films are dubbed with one actor, who performs lines in monotone for both genders (I personally felt this pain too growing up in China, but at least the production teams responsible for dubbing splurged for two actors!). This frustration partially led to the recognition that the world needed a strong text-to-speech offering.

An image of an ElevenLabs interface, allowing users to specify emotion, tone, volume, and language.

As the team scaled beyond 30 people (ElevenLabs now has ~350 employees) they decided to build out office hubs in London, Warsaw, and San Francisco. There are benefits to this beyond the wisdom of building team culture in person: in the case of ElevenLabs, international teams understand that voice is an extremely flexible abstraction layer.

“We realized that if we wanted the best people[…] we needed to hire wherever they were. We couldn’t lock ourselves to just San Francisco or the West coast.”

Research vs Product: when to ship and when to wait

In an earlier session at Runtime, Jeetu Patel (President & Chief Product Officer at Cisco) shared his candid opinion about what kind of companies will do well in coming years: the integrated product-model companies. As he put it: “I think the combination of a model working very closely with the product, and the model getting better as there’s feedback in the product, is gonna be super important.”

The motion of building a model and a product that makes use of it presents obvious advantages: the more people use the product, the better you can fine-tune the underlying model. In addition, when there are functionalities research alone cannot address yet, the product can perform the task to give research brewing time.

Tiny differences in the product – which in this case, could mean vocal output itself (in all its international nuance), or the interface presented to teams putting ElevenLabs to work in their own services – can have huge compounding impacts on what the model and the product become. So the designed product interface needs to be just in front of what your model “could do on its own”.

ElevenLabs felt this in the early days of their text-to-speech product, when one of their most commonly heard requests from customers was the desire for a slider to adjust the speed of voices. Initially, Mati and other members of the team were reluctant: they didn’t want to have sliders, toggles, or any kind of product that would make them seem too similar to previous generations of tooling.

The team then spent about nine months trying to solve this problem on the research side, not the product side. In the meantime, customers still wanted sliders. Eventually, ElevenLabs capitulated and built them directly into the product. Now the team has a heuristic: if something will take more than three months to solve with research, they try to solve it on the product layer instead.

“We didn’t want to become another generation of editing tools with endless sliders and toggles. So we tried to solve it through research — letting the model decide how fast a voice should speak. After nine months, we couldn’t crack it. A simple product fix did. Now our rule is clear: if research takes more than three months, product moves ahead.”

Design incentives to scale

My colleague Martin Casado likes to say that companies go through three phases: a product phase, a sales phase, and a scaling phase. ElevenLabs now has 350 people and has gotten a taste of the growing pains that come with navigating through each of these phases. I asked Mati about the biggest challenges that came with this evolution. His answer was:

“In the early days, everyone ran on passion and instinct. But as we scaled, it became obvious – building a real machine means designing the right incentives.”

Mati relayed a recent shift that was motivated by the observation that “quota and commissions are a lagging indicator, strategy is a leading indicator.” Explaining further, Mati spoke about a recent negotiation that would have seen a major foundation lab licensing and distributing ElevenLabs’ voice models in demos. For the sales team involved in that deal, it would have resulted in a big commission. But it also may not have been the best move strategically for the company as a whole: as we all know, this is an extremely competitive space.

So Mati arrived at a solution that some may consider unorthodox: sales teams can still see commission on a deal that gets killed. Sometimes it’s smarter to forgo shorter-term revenue wins in favor of keeping research and models proprietary.

Working with the creative industry instead of against it

When ElevenLabs was starting out in the creative space, the environment wasn’t necessarily friendly to the generative models pitching themselves to be used in production. ElevenLabs chose collaboration over disruption. Mati described spending time with artists, producers, and labels to deeply understand their priorities and incentives. He wanted to appreciate how AI could enhance rather than replace creative work. That meant learning from figures like Jarre about where AI adds value in the production process, and where human expression should remain untouched.

That philosophy led to the Voice Marketplace, where creators can clone, license, and monetize their voices. Mati shared that they now have almost 10,000 voices and have paid $10 million back to the community. One of the earliest voices, a deep Spanish tone that initially underperformed in Spain, became one of the top three voices globally once made available in English. The marketplace turned talent discovery into a global, multilingual phenomenon.

It’s a cool example of how AI can distill a voice as a medium: in a prior era, that person’s voice would have been confined to his own language.

It’s a cool example of how AI can distill a voice as a medium: in a prior era, that person’s voice would have been confined to his own language. Today, voice is more like software that can “run” in any language.

ElevenLabs applied the same approach to music, partnering with all major labels in the Big Four and other labels like Kobalt to build a licensed music model. “It took us 18 months to find an agreement that worked,” Mati said. At every step, transparency and engagement were key. ElevenLabs spent time with artists and label members to demystify the technology and avoid the “knee-jerk reaction that AI is bad.” The result is rare in generative AI: a company working with the creative industry to build the future, not bulldoze its past.

Transitioning from Consumer to Enterprise

When a company begins to scale from consumer to enterprise, something funny happens to everyone’s perception of time and urgency: the number of high-stakes deals and customers begins to accelerate, but the actual sales cycles and deployment times can elapse across months. While hearing feedback from a creator on a voice model can happen in a matter of minutes, iterating with a large customer takes much longer. You need to start getting comfortable with fast mode and slow mode.

When ElevenLabs launched, they had a ton of inbound from enterprises. At the time the organization was composed mostly of engineers and researchers, with no salespeople to speak of. This resulted in a temporary misstep, when Mati asked engineers (who as we know are not always the most sales-oriented folks in an organization) to handle sales. Per Mati, “At first we thought we could do it all with engineers — no salespeople, just product.”

That idealistic approach quickly met reality. Mati decided to invest in roles that were divided into 80% sales and 20% engineering. This became an important way to better understand customers, what they care about, and then proactively build product offerings that address those needs.

To serve hospitals, telcos, and global media platforms, the company had to build not just models, but pipelines and integrations. Today, ElevenLabs’ voice and agent platforms power applications in healthcare, customer experience, and entertainment, turning what started as a creator tool into a robust enterprise stack.

“It’s easy to do a demo, but how do you actually build it to production? How do you test it, do version control, evaluate, monitor, fine tune, based on the results?”

At this point, ElevenLabs has around 20 product teams composed of 5-10 people, which support both high-velocity shipping and enterprise-level discipline. Some teams work on verticals that are post-product market fit. For those teams, the stakes are well-understood: give enterprise customers and smaller creators a quality experience with no downtime. But there are also product teams working on newer initiatives at ElevenLabs that operate more like a fast moving micro-startup. These teams are working on pre-product market fit initiatives, and the stakes are more existential: they have six months to prove a product can resonate and get traction with customers, otherwise the product gets axed.

ElevenLabs’ early team thrived on product velocity and instant feedback from creators. Enterprise sales, by contrast, meant long cycles and patience. “Some of our team had never worked in enterprise: they were skeptical about waiting six or twelve months to see results,” Mati admitted. “In the early days, we had to shield them from that and just say, trust us, it’ll work.”

And you know what? It did.