A Software-Driven Autonomy Stack Is Taking Shape

Erin Price-Wright

If you’ve spent time in San Francisco recently, you might have been struck by the ubiquitous presence of Waymo cars on the roads. Self-driving cars are a visible example of autonomy finally starting to hit its stride, and big software winners like Applied Intuition and Scale AI emerged to serve their use cases. But while companies like Waymo, Cruise, Aurora, and Tesla had very public odysseys to get their vehicles on the streets, many others have begun to take advantage of the momentous progress in sensor technology, controls, reinforcement learning, and transformer models (in particular, large language models, or LLMs, and video language models, or VLMs) to bring autonomy to a broad range of less-visible use cases spanning from defense to water treatment. 

This is important because, historically, industrial autonomy efforts have involved a tight coupling between hardware and software. Autonomy happened at the firmware level and involved a lot of bespoke (and expensive) hardware — consider the vertically integrated autonomy efforts of legacy companies like John Deere or Honeywell. With advancements in AI, in particular, we are starting to observe a decoupling of hardware and software for autonomy use cases across industries like manufacturing, energy, mining, construction, industrial controls, and defense. This is similar to the dynamic we’ve begun to see in robotics, and will lead to powerful new capabilities, applications, and economics.  

In the context of American Dynamism, autonomy involves software that somehow interacts with the real world through hardware. Autonomy could mean some kind of robot, like a machine designed to harvest strawberries or move goods around a factory; a vehicle that navigates to a destination on its own; or even be a control system that learns from input data and optimizes an industrial process. 

Layers of the autonomy stack

We have observed four common layers emerge in the autonomy stack, regardless of form factor, domain, or use case. These layers are perception, location + mapping, planning + coordination, and controls. However, today’s techniques and approaches for solving these common challenges are often still quite bespoke and context dependent. 

Importantly, most software over the last decade was built around some key assumptions, including: 

  • Anything can run remotely and/or in the cloud 
  • Everywhere has Wi-Fi all the time 
  • If something doesn’t work the first time, you can try again 
  • It’s feasible to push software updates as often as you want 
  • Hardware is homogeneous and interchangeable 
  • Compute is essentially unlimited 

But most important autonomy use cases break many or all of these assumptions, and thus require a fundamentally different approach to designing and building software. 

As the ecosystem matures, we see opportunities to develop reusable components to solve these specific technical challenges within each layer of the autonomy stack. We discuss them in more detail below, but at a high level, these include things like: 

  • Model calibration 
  • Sensor integration
  • Edge processing
  • Data compression, streaming
  • Coordinating between local and remote compute 
  • Mesh networking, sharing state across agents in real time 
  • Establishing understanding and trust with human operators

There are also several related developer and operator tools that sit alongside this core stack, including testing, simulation, security, and fleet monitoring and management, that offer enormous opportunities for standardization. We’ll discuss those in depth at another time, keeping our focus in this post on the core stack, beginning at its foundational layer. 

Perception

For a system to operate autonomously, it must be able to understand the world around it, which requires perception. Perception systems can take several different forms, depending on the use case. For instance, they may include some combination of cameras, lidar, or other sensors (radio frequency / electromagnetic, temperature, pressure, chemical, thermal imaging, etc.).

Over the last decade, the quality of these perception systems has improved dramatically, while the hardware costs have plummeted. For example, the cost of a lidar system in 2010 was around $75,000; today, you can buy a cheap one for as little as $100. Similarly, thermal sensors, pressure sensors, electromagnetic sensors, etc., have all seen prices decrease by 70% or more over the past 15 years. 

The first thing an autonomous system has to do upon collecting data, with whatever perception sensors it’s using, is calibrate this data. For example: 

  • An onboard camera on an autonomous boat is subject to being tossed around in the waves. As a result, the horizon line might not show up in every frame and will move around at different angles. 
  • A brand-new sensor installed on a pump must be adjusted to align with a known standard flow rate. 
  • A lidar pack on a vehicle needs to account for the specific vibration frequency of the engine. 

Fortunately, the process of determining a baseline and using it to calibrate the perception system is pretty similar in all these cases, even if the data type is different. One early and promising field of research is using small AI models to do this calibration automatically on device, and we’re excited about the possibilities to standardize calibration independent of domain. 

But the raw data collection itself is not enough; a system must also be able to interpret the results. How does it determine whether the image from a camera is a human or a garbage can? How does it interpret a lidar point cloud? How does it parse pressure data to determine that there’s a leak? 

Thankfully, modern transformer-based AI techniques like VLMs have made interpreting these results much easier, and require less bespoke model development. Using an open-source pre-trained model, combined with fine-tuning on domain-specific data, you can now have a solid perception system in a matter of days, versus months or years with traditional methods. And, importantly, transformer models are good at handling different types of data, allowing them to quickly recognize patterns regardless of whether the input data is electromagnetic signals or image frames. There is still lots of work to be done here, but we’re excited by the prospect of applying general-purpose foundation models across different industries. 

Another important challenge many autonomy engineers face is deciding whether the data interpretation happens locally (on-device), or whether a raw signal should be processed remotely. The answer usually depends on a few different factors, including the importance of speed in reacting to the inputs, and whether the system is collecting signals from many disparate sources. Whatever the use case, most autonomous systems have a combination of processes that happen on-device and remotely, and must coordinate between them. 

There are not yet many industry-wide common approaches for handling this, but we have seen teams across our portfolio sharing home-grown data connectors and other libraries that they have built to solve some of these challenges today. And it’s an area where we are excited to see further development as standard, hardware-agnostic protocols emerge. 

Location + mapping

A critical component of the autonomy stack, especially with systems that move, is understanding position/location and having a map of what’s nearby. GPS is often one component of this, but alone usually is insufficient. For example, it’s not ideal indoors, in GPS-denied environments like contested war zones, in spaces where high-precision matters, or in scenarios where understanding transportation networks is important (e.g., navigating city streets or aisles in a warehouse). 

Even when the system doesn’t move, it’s important to have the notion of a map to understand how components of a system relate to each other physically. When a leak is discovered in a water treatment plant, for example, you may want other components to be “aware” of their physical relationship to the leak to know how to respond. 

Autonomy companies have historically used any signal they can get their hands on to enrich their understanding of where they are in space — including pre-built maps, landmark detection, GPS relaying, and even echolocation — meaning that location and mapping modules today are pretty custom and depend on a hodgepodge of different data types. 

Another interesting challenge is that maps can change over time — sometimes quite quickly! Think about when you’re using Google Maps and it suggests an alternative route after detecting a new accident. It’s critical that a system can handle that type of complexity, and then let other systems or human responders know whether they need to worry about it in real time, or whether updates can happen asynchronously. 

Addressing these capabilities is often a major source of effort for new autonomy programs. This is why it’s promising to see at least several companies emerging to develop standardized approaches for location and mapping, including auto-extraction of maps from unstructured data, better ways of sharing location information across heterogenous fleets in real time, and different approaches to “synthetic GPS”.     

Planning + coordination

Task planning is another common challenge for companies building autonomous systems. The actions a system needs to take can often be split between reflexes and tasks. For example, driving from one location to another might involve planning out a route, but if you come upon a large boulder blocking the road, you’re going to slam on the brakes without thinking about it. Finding this balance between local reflex-like actions and broader task-based plans is a common concern across most autonomy companies, and there are several considerations and challenges that span most domains:

  • Compute: This involves determining what processing happens locally on the device and what is offloaded to a remote system (e.g., a more powerful GPU located nearby or in the cloud), and how the models communicate.
  • Fleet coordination: It may be easy enough to send a single drone to a target, or to operate a single chiller in an HVAC system at a set temperature, but operating multiple agents that interact with each other synchronously is a much more complex task. Utilizing multiple agents as part of a larger system, to accomplish a greater mission, requires individual agents that can coordinate in real time to decide how to act.
  • Updating the plan: Sometimes, an autonomous agent must decide to change its original plan. This can be because of local sensors (e.g., discovering a boulder blocking the highway) or because another agent provided new information from a remote source (e.g., there’s an even bigger landslide a mile up the road).

Task planning and coordination is an active area of research and development for many of our portfolio companies, and we’re observing a convergence to the same set of core challenges across very different contexts. 

Although the commercial market is still early in putting these principles into practice, we’re seeing promising trends across a number of companies and categories, including energy, industrials, water treatment, chemicals, mining, and defense. In particular, LLMs combined with deep reinforcement learning have emerged as a critical unlock to solving these challenges in rapidly changing, complex environments where there are too many variables to enumerate deterministically. 

The way traditional autonomous systems are designed, using control theory, is that you take the full dynamics model of a system and find an equilibrium based on an objective function. You then apply forces to the system to steer it toward those. When AI enters the picture, you don’t assume to have access to the underlying model; instead you have historical states and actions over time, and the system learns an optimal policy. This works particularly well when there are too many variables to capture in a single model, or when it’s not possible to model how different variables interact with each other — which is the case for almost all complex, real-world autonomous systems operating at scale. 

Controls

The final — and arguably most critical — part of the autonomy software stack is the control system, which is what actually allows an agent to interact with the real world. Whether it’s manipulation, like a robot picking things out of a bin to pack a box, or a valve that controls the flow of reactant in a chemical process, this is the piece of the stack that is usually the least standardized today. In many use cases, it will remain the most bespoke for the foreseeable future. 

The sophistication and standardization of the control stack is partly driven by hardware replacement cycles in any given industry. In industries where there is a high turnover of hardware — military drones, for instance — we see new hardware systems being developed with autonomy in mind as a first-order use case. We also see a somewhat standard set of hardware components (e.g., motors and flight controllers for drones), so integrating with a new model is a relatively straightforward exercise. A similar pattern is emerging with many robotics use cases. 

However, in many industries — think large-scale HVAC systems, chemical plants, heavy machinery, deepwater oil wells, etc. — hardware represents massive capex and replacement cycles are much longer. In these cases, autonomy companies have to get more creative about how they integrate with the control system, and they often experience a significant “controls gap.” In other words, there is a big difference in what the human operator thinks they can control and what the system is actually configured to do. This has its pros and cons: It can take a lot of time and effort to get these systems up and running, but they are often extremely sticky once they are. 

The last step of handing over the keys to an autonomous system — establishing trust with human operators — is often the most challenging part, and it’s more of a product problem than a technical one. Whenever you have AI systems making decisions — whether that’s what route to take to a destination, what controls settings to adjust, or how to plan a tactical mission — it’s understandably very tricky for humans to give up control. Hence, almost every autonomy company we meet must develop some notion of a human-in-the-loop or read-only version of the system to help operators understand the decisions an autonomous system would make under a given set of real-world conditions. 

Often, an autonomous system will not do things exactly the way a human might, because the system is able to hold state across many more different variables at once than a human can. On the other hand, autonomous systems occasionally develop emergent behavior that closely mimics how the most high-skilled operators instinctively work, but might struggle to explain or teach, because it’s based on years, or decades, of learned intuition. Therefore, developing a clear understanding of how to benchmark a system’s performance from the outset, and offering in-depth analytics, is critical to gaining trust. It’s definitely a product problem that autonomy founders need to prioritize early in development.

Let’s talk

Although we are starting to see common patterns emerge in how companies across different industries are approaching autonomy — especially as building blocks like sensor technology and AI models have matured — these are still complex challenges. True autonomy spans software and hardware, and we are in the early innings of designing and building consistent, reusable infrastructure to enable it at scale. If you’re working on infrastructure at any layer of the autonomy stack, or building a full-stack autonomy product to serve industrial use cases, we’d love to talk!

Want more a16z American Dynamism?

Sign up to stay updated on the ideas, companies, and individuals building toward a more dynamic future.

Thanks for signing up for the a16z American Dynamism newsletter.

Check your inbox for a welcome note.

MANAGE MY SUBSCRIPTIONS By clicking the Subscribe button, you agree to the Privacy Policy.