This conversation is part of our AI Revolution series, which features some of the most impactful builders in the field of AI discussing and debating where we are, where we’re going, and the big open questions in AI. Find more content from our AI Revolution series on www.a16z.com/AIRevolution.
Waymo Co-CEO Dmitri Dolgov joined a16z General Partner David George to discuss how Waymo is using genAI to help them build safer, sustainable, and more accessible transportation. In the conversation, Dmitri talks about the potential of embodied AI, the value of simulations and building training data, and his approach to leading a company focused on solving the challenges of building AI that can navigate the real world.
David: Maybe to start, take us back to Stanford, if you will. That was when you first started working on the DARPA project, and maybe give us a little bit of your history of how you ended up from there to here.
Dmitri: My introduction to autonomous vehicles was when I was doing a postdoc at Stanford that you just mentioned, David. I got pretty lucky with the timing of it. This was when the DARPA Grand Challenges were happening. DARPA is the Defense Advanced Research Project Agency that started these competitions with the goal of boosting this field of autonomous vehicles. The one that I got involved in was in 2007, that was called the DARPA Urban Challenge.
So the setup there was, it’s going to look like a toy version of what we’ve been working on since then. That was supposed to mimic the driving in urban environments, so they created a fake city on an abandoned air base, and they populated it with a bunch of autonomous vehicles and a bunch of human drivers, and they had them do various tasks. So that was kind of my introduction to this whole field. I think in DARPA, these challenges are often, by people in the industry, considered a foundational, pivotal moment for this whole field, and it was definitely that for me. It was a light bulb moment that really got me hooked.
David: What was the hardware and software that you guys had at that point? This is 2007.
Dmitri: Yeah, it was a very high level, not unlike what we talk about today. A car that has some instrumentation, so you can tell it what to do, and you get some feedback back. Then you have what’s called a post system, a bunch of inertial measurement systems, accelerometers, gyroscopes that tell you (and GPS tells you) how you’re moving through space. It has sensors, radars, lighters, and cameras, the same stuff we still use today. There’s a computer that gets the sensor data in, and then tells the car what to do, and a bunch of software. Software had perception components, decision-making and planning components, and some AI.
But of course, everything that we had, like each one of those things over that—how long has it been? Almost 18 years, more than that—has changed drastically, right? So when we talk about AI today versus AI we had back in 2007, 2009, nothing in common. And similarly, everything else has changed. The sensors are not the same, computers are not the same.
David: So at that point, that was the light bulb moment, and then at that point you said, “Okay, I’m at Stanford, I want to make this my career,” right? And then it was Toyota, and then where did it go from there?
Dmitri: I don’t know if I thought about it in those terms. I was like, “This is the future. I want to make it happen, I want to be building this thing. Okay. You know, I can’t wait.” But that was the next step, that was the next big step, as a number of us from the DARPA Challenge competitions started the Google Self-Driving Project. It was about a dozen of us, then in 2009 came together at Google with support and excitement from Larry and Sergey to see if we can take it to the next step. We worked on it for a few years, and that project then became Waymo in 2016, and we’ve been on this path since then.
David: So we have this new big breakthrough in generative AI. Some would say it’s new, some would say it’s 70 years in the making. How do you think about layering advances that have come from generative AI to what many would describe as more traditional AI or machine learning techniques that were the building blocks for self-driving technology up to that point?
Dmitri: Great question. So generative AI is kind of a broad term. Maybe we can take a little bit of a step back and talk about the role that AI plays in autonomous vehicles and kind of how we saw the various breakthroughs in AI mapped to the space of our task. So as you mentioned, AI has been part of self-driving autonomous vehicles from the earliest days. Back when we started, it was a very different kind of AI, ML, kind of classical main, decision trees, classical computer visions with kind of hand-engineered features, kernels, and so forth. And then one of the first really important breakthroughs that happened in AI and computer vision but really was important for our task was the advancement in convolutional neural networks right around 2012.
Many of you are probably familiar with AlexNet and the ImageNet competition. This is where AlexNet blew out of the water all other approaches. So that obviously has had very strong applications for our domain, like, how you do computer vision and not just on cameras, right? You can use ConvNets to interpret what’s around you and do kind of object detection and classification from camera data, from LiDAR data, from your imaging radars. That was kind of a big boost around that 2012, 2013 timeframe. And then we played with those approaches and tried to extend the use of ConvNets to other domains, just beyond perception with some interesting but limited success.
Another big, very important breakthrough happened around 2017 when transformers came around. It had a really huge impact on language, language understanding, language models, machine translation, and so forth. For us, it was a really important breakthrough that really allowed us to take ML and AI to new areas well beyond perception. So if you think about transformers and the impact that they had on language, the intuition is that they’re good at understanding and predicting and generating sequences of words.
In our domain, we think about the tasks of understanding and predicting what people will do, like other actors in the scene, or the task of decision making and planning your own trajectories. Or in simulation, generating generative AI, our version of generative AI, generating behaviors of how the world will evolve. These sequences are not unlike sentences. You’re operating the state of objects. There’s local continuity, but then the global context of the scene really matters. So this is where we saw some really exciting breakthroughs in behavior prediction and decision-making and simulation.
Since then, we’ve been on this trend of models getting bigger. People started building foundation models for multitasks. And most recently, to kind of use the last couple of years, all the breakthroughs in large language models, modern state, modern day generative AI, visual language models where you can align image understanding and language understanding. Most recently, one thing I’m pretty excited about is the intersection or combination of the two. So that’s what we’ve been very focused on at Waymo most recently, is taking the AI backbone and all of the Waymo AI that over the years we’ve built up that is really proficient in this task of autonomous driving and combining it with kind of the general world knowledge and understanding of these VLMs.
David: One of the things that you just mentioned is the role of simulation and how that has been used. You guys have had major breakthroughs in the use of simulation. In the recent breakthroughs in generative AI around synthetic data, its usefulness is somewhat in question. I would say in your field, this idea of synthetic data and simulation is extremely useful, and you’ve proven that. So maybe you could just talk about the simulation technology you guys have built, how it’s allowed you to scale, build that real-world understanding, and maybe how it’s changed in the last few years.
Dmitri: It is super important in our field. Largely, if you think about this question of evaluating the driver, is it good enough? How do you answer that? There’s a lot of metrics and a lot of data sets that you have to build up. And then how do you evaluate the latest version of your system? You can’t just throw it on the physical world and then see what happens. You have to do it in simulation. But of course, the new system behaves differently from what might have happened in the world otherwise. So you have to have a realistic closed-loop simulation to give you confidence in that.
That is one of the most important needs for the simulation. You’ve also mentioned synthetic data. That’s another area where simulation allows you to have very high leverage. You just have to explore the long tail of events. Maybe there’s something interesting that you have seen in the physical world. But you want to modify a scenario and you want to turn one event into thousands or tens of thousands of variations of a scenario. How do you do that? This is where the simulation comes in.
And then lastly, you sometimes want to evaluate and train on things that you’ve never seen, even our very vast experience. So this is where purely synthetic simulations come in that are not based on anything that you have seen in the physical world. In terms of technologies that go into play, it’s a lot. That is a huge generative AI problem. But what’s really important is that that simulator is realistic. It has to be realistic in terms of your sensor or perception realism. It has to be realistic in terms of the behaviors that you see from other dynamic actors. If other actors are not behaving in a realistic way, like if pedestrians are not walking the way they do in the real world, you need to be able to quantify the scenarios that you create in simulation to the realism and the rate of occurrence in the physical world.
It’s very crazy to sample something totally crazy in the simulator, but then what do you do with that? So I think that that brings me to the third point of realism, is that it has to be kind of realistic and quantifiable at the macro level, at the statistical level. And you can imagine, there’s a lot of work that goes into building a simulator that is large-scale and has that level of realism across those categories. And if you kind of intuitively think about it, to build a good driver, you need to have a very good simulator, but to have a good simulator, you actually have to build models of realistic pedestrians and cyclists and drivers, right? So it’s good you kind of do that iteratively.
David: Yeah, of course. And then by having this simulation software that is very good at mimicking the real world and very usable in the sense that you can create variables in the scenes, you can actually give the driver multiples of the amount of experience that they have on the road.
Dmitri: That’s exactly right.
David: In real miles driven, is that right?
Dmitri: That’s exactly right. We’ve driven tens of millions of miles in the physical world, and at this point, we’ve driven more than 15 million miles in full autonomy, what we call rider-only mode, but we’ve driven tens of billions of miles in simulation, so you get orders of magnitude of an amplifier.
David: Speaking of multiples of miles driven, one of the hotly debated topics in the AI world today is this concept of scaling laws. So how do you think about scaling laws as it relates to autonomous driving? Is it miles driven? Is it a certain experience add? Is it compute? What are the ways that you think about that?
Dmitri: So model size matters. We’re saying scaling laws applied, a lot of typical old-school models are severely undertrained. If you have a bigger model, you have data that actually does help you. You just have more capacity, they generalize better. So we are seeing scaling laws apply there. Data, of course, usually matters, but it’s not just counting the miles or hours. It has to be the right kind of data that teaches the models or trains the models to be good at the rare cases that you care about. And then there is a bit of a wrinkle because then you can build those very large models, but in our space, it has to run onboard the car, right? So you are somewhat computer-constrained, so you have to distill it into your onboard system. But we do see a trend, we just see a common trend and we see that play out in our space where you’re much better off training a huge model and then distilling it into a small model and just training small models.
David: Yeah. I’m going to shift gears a little bit and do a sort of simplifying statement, which is probably going to drive you crazy. But the DARPA school of thought is, there’s sort of a rules-based approach, right? A more traditional kind of AI-based approach with a massive amount of volume, and you document edge cases, and then the model then learns how to react to those. The more recent approaches from some other large players and startups would say, “Hey, we just have AI from the start, make all the decisions end to end, you don’t need to have sort of all that pattern recognition and learning, like the end-to-end driving that is kind of a tagline out there.” What is your interpretation of that approach and what elements of that approach have you taken and applied inside of Waymo?
Dmitri: Yeah. Sometimes the way people talk about it is kind of this weird dichotomy, it’s this or that.
David: Yeah, of course.
Dmitri: But it’s not. It’s that and then some, right? So it is big models, it is end-to-end models. It is generative AI and combining these models with VLMs, right? But the problem is it’s not enough, right? We all know the limitations of those models, right? And that’s what we’ve seen through the years, a lot of these breakthroughs in AI. [Hyperscalers], transformers, big end-to-end foundation models, they’re huge boosts to us. And what we’ve been doing at Waymo through the history of our project is constantly applying and pushing forward these state-of-the-art techniques ourselves and coming from cases, but then applying them to our domain. And what we’ve been learning is that they really give you a huge boost, but they’re just not enough.
The theme has always been that you can take your latest and greatest technology of the day and it’s fairly easy to get started. Like the curves always look like that. The curves are in their shaping, but the really hard problems are in that remaining 0.0001%, and there it’s not enough. You have to do stuff on top of that. So yes, nowadays you can take an end-to-end model, go from sensor to trajectories or actuation. Typically, you don’t build them in one stage, you build them in stages, but you can do backprop through the whole thing. So the concept is very, very valid. You can combine it with a VLM and then you add a closed-loop simulation, and you’re off to the races. You can have a great demo, like almost out of the box. You can have an ADAS or a driver assist system, but that’s not enough to go all the way to full autonomy. That’s where really a lot of the hard work happens.
So I guess the question is not is it this or that, it’s this and then what else do you need to take it all the way to have the confidence so that you can actually remove the driver and go for full autonomy? That’s a ton of work. That’s a ton of work through the entire life cycle of these models and the entire system, right? So it starts with training. Like how do you train? How do you architect these models? How do you evaluate them? Then if you put in a bigger system, the models themselves are not enough. So you have to do things around them. Modern generative AI is great, but there are some issues with hallucinations.
David: Hallucinations, explainability.
Dmitri: Exactly, exactly. They have some weaknesses in goal-oriented planning, policy-making, and kind of understanding this 3D…operating in this 3D spatial world. So you have to add something on top of that. We talked a little bit about the simulator. That’s a really hard problem in and of itself. Once you have something, once you deploy it and you learn, how do you feed that back? This is where all of the really, really hard work happens. So it’s not like end-to-end versus something else. It is end-to-end and big foundation models and then the hard work.
David: And then all the hard work. Yeah, it totally makes sense. That is a great segue into all of the progress that you guys have made, right? Riding in the Waymo is an extraordinary experience. It’s not to say that you have solved all of these complex tasks, but you’ve solved a lot of them. What are some of the biggest AI or data problems that you still feel like you’re facing today?
Dmitri: The short answer is going to be taking it to the next order of magnitude of scale, multiple orders of magnitude of scale. And with that come additional improvements that we need to make it a great service. But just to level-set in terms of where we are today, we are driving in all kinds of conditions. We’re driving 24/7 in San Francisco, in Phoenix—those are the most mature markets—but also in LA and in Austin. All of the complexity that you see when you drive around the city: all kinds of weather conditions, whether it’s fog or storms or dust storms or rain storms down here, like all of those are conditions that we do operate in, right?
So then I think about what makes it a great customer experience. What does it take if you grow by next orders of magnitude? There’s a lot of improvements that we want to make so that it becomes a better service for you to get from point A to point B. We ask for feedback from our riders. A lot of feedback we get has to do with the quality of your pickup and drop-off locations. So we’re learning from users, like we want to make it a magical, seamless, delightful experience from the time you start the app on your phone to when you get into the station. So that’s a lot of the work that we’re doing right now.
David: Yeah. Pickup and drop off, for what it’s worth, is an extraordinarily hard problem, right? Like, do you kind of block a little bit of a driveway if you’re in an urban location and then have a sensor that says, oh, actually I just saw somebody opening a garage door, I need to get out of the way, how far down the street is acceptable to go pull, or if you’re in a parking lot, where in the parking lot do you go? This is an extraordinarily hard problem, but to your point, it’s huge for user experience.
Dmitri: That’s exactly right. I think that’s a good example of, like, hey, just one thing, one of the many things that we have to build in order for this to be an awesome product. Not just a technology demonstrator. And I think you just hit exactly on a few things that make…something that’s kind of at the face of it might seem fairly straightforward, right? I know there’s a place on the map and I need to pull over, so how hard can it be? But really, if it’s a complicated, dense urban environment, there’s a lot of these factors. Is there another vehicle that you’re gonna be blocking? Is there a garage door that’s opening? What is the most convenient place for the user to pick up? So it really gets into the depth and the subtlety of understanding the semantics and the dynamic nature of this driving task and doing things that are safe, comfortable, and predictable, and it leads to a nice, seamless, pleasant, delightful customer experience.
David: Of course. Okay. So you’ve mentioned this stat, but 15 million miles, I know the number’s probably a little bit bigger than that, but you released it Tuesday. It’s growing by the day. Fifteen million autonomous miles driven. That’s incredible. Even more impressive, and you didn’t share this stat yet, it results in 3.5 times fewer accidents than human drivers. Is that right?
Dmitri: I think 3.5x is the reduction in injury, and then it’s about 2x reduction in the police reports of lower severity incidents.
David: This comes to a question of both regulatory and business or ethical judgment. What is the right level that you want to get to? Obviously, you want to constantly get better, but is there a level at which you say, “Okay, we’re good enough,” and that’s acceptable to regulators?
Dmitri: So there’s no super simple, short answer. I think it starts with that. It starts with those statistics that you just mentioned. At the end of the day, what we care about is that roads are safer. So when you look at those numbers, where we operate today, and we have strong empirical evidence that our cars are, in those areas, safer than human drivers. On balance, that means a reduction in collisions and harm. Then, actually, on top of the numbers we’ve been publishing, this is according to the latest numbers that we shared, consistently sharing numbers as our service scales up and grows. If you can also bring in an additional lens of how much did you contribute to a collision?
We actually published—I think it was based on about 4 million miles, 3.8 million miles—we published a joint study with Swiss Re, which is, I think, the largest global reinsurer in the world. And the way they look at it is, you know, who contributed to an event. We saw the same theme, but the numbers were very strong. That new field was a 76% reduction in property damage collisions, and it was a 100% reduction in claims around bodily injury. So if you kind of bring in that lens, I think the story becomes even more compelling.
David: That is extremely compelling.
Dmitri: Right, but there are some collisions where, you know, we’d be…and that’s the bulk of the events that we see, we’d be stopped at a red light, and then somebody just plows into you, right?
David: Sure.
Dmitri: But then, I think, you know, we do know it’s a new technology, it’s a new product, so it is held to a higher standard. So when we think about our safety and our readiness framing methodology, we don’t stop at just the race, right? We build over the years. That was one of the huge areas of investment and experience over the years, like what else do you need? And we’ve done a number of other different things, and we’ve published some of our methodologies, we’ve shared our readiness framework. You know, we do other things like we actually…not just statistically, but on specific events, we build models of an attentive, very good human driver. Not distracted. You know, it’s a good question whether such a driver exists, right? But that’s what we compare our driver to. And it’s a model. And then in this particular scenario, we evaluate ourselves versus that model of a human driver, and we hold ourselves to the bar of doing well compared to that very high standard. And then, you know, you pursue other validation methodology. It’s the aggregate of all of those methodologies that we look at to decide that yes, the system is ready enough to be deployed in scale.
David: I’d love for you to talk about what you think, maybe today and in the future, about market structure, competition, and what kind of role you envision Waymo playing.
Dmitri: The way we think about Waymo and our company is that we are building a generalizable driver. That’s the core. That’s the core of the mission of making transportation safe and accessible, right? And we’re talking about ride-hailing today. That’s our main, most mature primary application. But, you know, we envision a future where the Waymo driver will be deployed in other commercial applications, right? There’s deliveries, there’s trucking, there’s personally owned vehicles, right?
So in all of those, our guiding principle would be to think about the go-to-market strategy in a way that accelerates access to this technology and gets deployed as broadly, while of course doing it gradually and deliberately and safely, as quickly and broadly as possible. So with that as our guiding principle, we’re going to explore different commercial structures, different partnership structures. For example, in Phoenix today, we have a partnership with Uber and ride-hailing, both in Uber ride-hailing, and in Uber Eats. So in Phoenix, we have our own app. You can download the Waymo app and take a ride, and our vehicle will show up and take you where you want to go. That’s one way to experience our product.
Another one is through the Uber app. We have a partnership where you can get through Uber app, matched with our product, the Waymo driver, the Waymo vehicle, and it’s the same experience. But this is another way for us to accelerate and give more people the opportunity to experience full autonomy. It gives us a chance to think about the different go-to-market strategies. One is us having more of our own app. The other one is more of a driver-as-a-service for somebody else’s network. We’ll start early, but we’ll iterate and put all in service of that main principle.
David: That’s amazing. Yeah, that’s going to be exciting. Maybe back to the vehicle, what about the hardware stack that you use? Going all the way back to DARPA, you know, it’s kind of the same stuff, right? They’ve advanced quite considerably, but you still use radars and LiDAR. Do you think that remains the future path for autonomous driving, LiDAR specifically?
Dmitri: I mean, the sensors are physically different, right? They have each one: cameras, LiDARs, radar, they have their benefits. Each one brings their own benefits. Cameras obviously give you color and they give you very high resolution. LiDARs give you a direct 3D measurement of your environment and they’re an active sensor. So it can bring their own energy, pitch dark when there’s no external light source, you still get the seat just as well as they do during the day, better in some cases.
And then radar is very good at punching through just physics, different wavelengths, right? So if you build an imaging radar, which we do ourselves, it allows us to give you an additional redundancy layer and it has benefits, also an active sensor, it can directly measure Doppler velocity of other objects and it degrades differently and more gracefully in some other conditions, like in a very dense fog, or very dense rain. They all have their benefits. Our approach has been to use all of them. And that’s how you have redundancy and that’s how you get an extra boost in capability of the system.
Today we’re deployed on the fifth and working to deploy the sixth generation of our sensors. Over those generations, we’ve improved reliability, we’ve improved capability and performance, and we’ve brought down the cost very significantly. Forus, using all three modalities just makes a lot of sense. You might make different trade-offs if you are building a driver assist system versus a fully autonomous vehicle where that last 0.001% really, really matters.
David: One of the observations that we have from the very early days of this wave of LLMs is that there has been sort of already a massive race of cost reduction. Many would argue that it’s sort of a process of commoditization already, even though it’s very early days. I would say the observation from autonomous driving over many, many years now is kind of the opposite thing. There’s been a thinning of the field. You know, it’s proven to be much, much harder than expected. Can you just talk about maybe why that’s the case?
Dmitri: You know, they always have this property that it’s very easy to get started, but it’s very insanely difficult to get it all the way to full autonomy so that you can remove the driver. There’s maybe a few factors that contribute to that. One is, compared to the LLMs and AI in the digital world, you have to operate in the physical world. The physical world is messy, it is noisy, and it can be quite humbling, right? There’s all kinds of uncertainty and noise that can pull you out of distribution, if you will, right?
David: Sure.
Dmitri: That’s one thing that makes this very difficult. And secondly, it’s safety. These AI systems, in some domains, this is creativity, and it’s great. Our domain, the cost of mistakes, our lack of accuracy has very serious consequences, right?
David: Yeah, no tolerance.
Dmitri: That sets the bar very, very high. The last thing is that it is…you know, you have to operate in real-time, right? You’re putting these systems on fast-moving vehicles and you have to… milliseconds matter, right? You have to make the decisions very quickly. So I think it’s the combination of those factors that really together lead to the trend that you’ve been seeing. You have to be excellent in this and this and this. It’s all of the above. The bar is very, very high for every component of the system and how you put them together. There are big advances and they boost you and they propel the system forward. But there are no silver bullets. There are no shortcuts if you’re talking about full autonomy.
David: And because of that lack of tolerance for errors, you have a very high bar for safety. You have a very high burden from regulators. You know, it’s very costly to go through all those processes. I’m very grateful that you guys have seen it through, despite all the humbling experiences that you had along the way.
Dmitri: It’s been a long journey, but for me and many people at Waymo, it is super exciting and very, very rewarding to finally see it become reality. Now we talk about safety and AI in many contexts, right? That’s a big question. But, you know, here we are in this application of AI in the physical world. We have, at this point, a pretty robust and increasing body of evidence that we are seeing tangible safety benefits. That’s very exciting.
David: Yeah. I always say to people, it was a long journey and very costly and expensive along the way, but this is probably the most powerful manifestation of AI that we have available to us in the world today. I mean, you can get in a car without a driver and it’s safer than having a human. That’s just remarkable. What were some of those humbling events along the way?
Dmitri: In those early days, the first couple of years?
David: Early days.
Dmitri: I remember one, there was one route that we did that started…I think it started in Montevideo, then went through Palo Alto, then went through the mountains to Highway 1, then took Highway 1 to San Francisco. I think we went around the city a little bit and, like, actually finished with Lombard Street. In 2009, that is really complicated.
David: A hundred miles from the beginning to end, right? Those human drivers would fail at that task, I think.
Dmitri: Yeah. So, you know, we’re doing it one day and then we’re driving, we kind of made it through the Montevideo – Palo Alto part, we’re driving through the mountains and it’s foggy, it’s early morning, and then we’re seeing objects. Objects seem like random stuff on the road in front of us. There’s, like, a bucket and a shoe. And then at some point, we come across, like, a rusty bicycle. And we ask ourselves, “Okay, what’s going on there?” We catch up eventually. I think the car handles it okay. Maybe not super smoothly, but, you know, we couldn’t get stuck. We catch up to this dump truck that has all kinds of stuff on it and it’s, you know, periodically losing things that present obstacles to the car.
David: This is like a cartoon, you know, continuation of anomalies being thrown at you guys. That’s pretty cool. Okay, last question, and I’m going tee you up to do some recruiting, probably. I’m going to take something that you said, which is, I can see your passion and excitement for doing the startup thing, right? And, you know, kind of longing back for those days is so cool. What advice would you have for these folks in where to go, whether it’s type of company, type of role, industry, or anything else?
Dmitri: Waymo.
David: That’s what I was saying.
Dmitri: That’s what you said to me.
David: It’s easy to just tee you right up.
Dmitri: Yeah, yeah. I mean, we’re talking about AI today, but it’s a problem that matters, you know, a problem that matters to the world, a problem that matters to you. Chances are it’s going to be a hard one. So don’t get discouraged by the unknown, by what others might tell you, and, you know, start building, and then keep building and don’t look back.
David: A huge congratulations on all the progress you guys have made. And as a very happy customer, thank you for building it, and we really appreciate you being here.
David George is a General Partner at Andreessen Horowitz, where he leads the firm’s Growth investing team.
Dmitri Dolgov is co-CEO of Waymo.