A Software-Driven Autonomy Stack Is Taking Shape

Table of Contents

If you’ve spent time in San Francisco recently, you might have been struck by the ubiquitous presence of Waymo cars on the roads. Self-driving cars are a visible example of autonomy finally starting to hit its stride, and big software winners like Applied Intuition and Scale AI emerged to serve their use cases. But while companies like Waymo, Cruise, Aurora, and Tesla had very public odysseys to get their vehicles on the streets, many others have begun to take advantage of the momentous progress in sensor technology, controls, reinforcement learning, and transformer models (in particular, large language models, or LLMs, and video language models, or VLMs) to bring autonomy to a broad range of less-visible use cases spanning from defense to water treatment.

This is important because, historically, industrial autonomy efforts have involved a tight coupling between hardware and software. Autonomy happened at the firmware level and involved a lot of bespoke (and expensive) hardware — consider the vertically integrated autonomy efforts of legacy companies like John Deere or Honeywell. With advancements in AI, in particular, we are starting to observe a decoupling of hardware and software for autonomy use cases across industries like manufacturing, energy, mining, construction, industrial controls, and defense. This is similar to the dynamic we’ve begun to see in robotics, and will lead to powerful new capabilities, applications, and economics.

In the context of American Dynamism, autonomy involves software that somehow interacts with the real world through hardware. Autonomy could mean some kind of robot, like a machine designed to harvest strawberries or move goods around a factory; a vehicle that navigates to a destination on its own; or even be a control system that learns from input data and optimizes an industrial process.

Layers of the autonomy stack

We have observed four common layers emerge in the autonomy stack, regardless of form factor, domain, or use case. These layers are perception, location + mapping, planning + coordination, and controls. However, today’s techniques and approaches for solving these common challenges are often still quite bespoke and context dependent.

Importantly, most software over the last decade was built around some key assumptions, including:

Anything can run remotely and/or in the cloud
Everywhere has Wi-Fi all the time
If something doesn’t work the first time, you can try again
It’s feasible to push software updates as often as you want
Hardware is homogeneous and interchangeable
Compute is essentially unlimited

But most important autonomy use cases break many or all of these assumptions, and thus require a fundamentally different approach to designing and building software.

As the ecosystem matures, we see opportunities to develop reusable components to solve these specific technical challenges within each layer of the autonomy stack. We discuss them in more detail below, but at a high level, these include things like:

Model calibration
Sensor integration
Edge processing
Data compression, streaming
Coordinating between local and remote compute
Mesh networking, sharing state across agents in real time
Establishing understanding and trust with human operators

There are also several related developer and operator tools that sit alongside this core stack, including testing, simulation, security, and fleet monitoring and management, that offer enormous opportunities for standardization. We’ll discuss those in depth at another time, keeping our focus in this post on the core stack, beginning at its foundational layer.

Perception

For a system to operate autonomously, it must be able to understand the world around it, which requires perception. Perception systems can take several different forms, depending on the use case. For instance, they may include some combination of cameras, lidar, or other sensors (radio frequency / electromagnetic, temperature, pressure, chemical, thermal imaging, etc.).

Over the last decade, the quality of these perception systems has improved dramatically, while the hardware costs have plummeted. For example, the cost of a lidar system in 2010 was around $75,000; today, you can buy a cheap one for as little as $100. Similarly, thermal sensors, pressure sensors, electromagnetic sensors, etc., have all seen prices decrease by 70% or more over the past 15 years.

The first thing an autonomous system has to do upon collecting data, with whatever perception sensors it’s using, is calibrate this data. For example:

An onboard camera on an autonomous boat is subject to being tossed around in the waves. As a result, the horizon line might not show up in every frame and will move around at different angles.
A brand-new sensor installed on a pump must be adjusted to align with a known standard flow rate.
A lidar pack on a vehicle needs to account for the specific vibration frequency of the engine.

Fortunately, the process of determining a baseline and using it to calibrate the perception system is pretty similar in all these cases, even if the data type is different. One early and promising field of research is using small AI models to do this calibration automatically on device, and we’re excited about the possibilities to standardize calibration independent of domain.

But the raw data collection itself is not enough; a system must also be able to interpret the results. How does it determine whether the image from a camera is a human or a garbage can? How does it interpret a lidar point cloud? How does it parse pressure data to determine that there’s a leak?

Thankfully, modern transformer-based AI techniques like VLMs have made interpreting these results much easier, and require less bespoke model development. Using an open-source pre-trained model, combined with fine-tuning on domain-specific data, you can now have a solid perception system in a matter of days, versus months or years with traditional methods. And, importantly, transformer models are good at handling different types of data, allowing them to quickly recognize patterns regardless of whether the input data is electromagnetic signals or image frames. There is still lots of work to be done here, but we’re excited by the prospect of applying general-purpose foundation models across different industries.

Another important challenge many autonomy engineers face is deciding whether the data interpretation happens locally (on-device), or whether a raw signal should be processed remotely. The answer usually depends on a few different factors, including the importance of speed in reacting to the inputs, and whether the system is collecting signals from many disparate sources. Whatever the use case, most autonomous systems have a combination of processes that happen on-device and remotely, and must coordinate between them.

There are not yet many industry-wide common approaches for handling this, but we have seen teams across our portfolio sharing home-grown data connectors and other libraries that they have built to solve some of these challenges today. And it’s an area where we are excited to see further development as standard, hardware-agnostic protocols emerge.

Location + mapping

A critical component of the autonomy stack, especially with systems that move, is understanding position/location and having a map of what’s nearby. GPS is often one component of this, but alone usually is insufficient. For example, it’s not ideal indoors, in GPS-denied environments like contested war zones, in spaces where high-precision matters, or in scenarios where understanding transportation networks is important (e.g., navigating city streets or aisles in a warehouse).

Even when the system doesn’t move, it’s important to have the notion of a map to understand how components of a system relate to each other physically. When a leak is discovered in a water treatment plant, for example, you may want other components to be “aware” of their physical relationship to the leak to know how to respond.

Autonomy companies have historically used any signal they can get their hands on to enrich their understanding of where they are in space — including pre-built maps, landmark detection, GPS relaying, and even echolocation — meaning that location and mapping modules today are pretty custom and depend on a hodgepodge of different data types.

Another interesting challenge is that maps can change over time — sometimes quite quickly! Think about when you’re using Google Maps and it suggests an alternative route after detecting a new accident. It’s critical that a system can handle that type of complexity, and then let other systems or human responders know whether they need to worry about it in real time, or whether updates can happen asynchronously.

Addressing these capabilities is often a major source of effort for new autonomy programs. This is why it’s promising to see at least several companies emerging to develop standardized approaches for location and mapping, including auto-extraction of maps from unstructured data, better ways of sharing location information across heterogenous fleets in real time, and different approaches to “synthetic GPS”.

Planning + coordination

Task planning is another common challenge for companies building autonomous systems. The actions a system needs to take can often be split between reflexes and tasks. For example, driving from one location to another might involve planning out a route, but if you come upon a large boulder blocking the road, you’re going to slam on the brakes without thinking about it. Finding this balance between local reflex-like actions and broader task-based plans is a common concern across most autonomy companies, and there are several considerations and challenges that span most domains:

Compute: This involves determining what processing happens locally on the device and what is offloaded to a remote system (e.g., a more powerful GPU located nearby or in the cloud), and how the models communicate.
Fleet coordination: It may be easy enough to send a single drone to a target, or to operate a single chiller in an HVAC system at a set temperature, but operating multiple agents that interact with each other synchronously is a much more complex task. Utilizing multiple agents as part of a larger system, to accomplish a greater mission, requires individual agents that can coordinate in real time to decide how to act.
Updating the plan: Sometimes, an autonomous agent must decide to change its original plan. This can be because of local sensors (e.g., discovering a boulder blocking the highway) or because another agent provided new information from a remote source (e.g., there’s an even bigger landslide a mile up the road).

Task planning and coordination is an active area of research and development for many of our portfolio companies, and we’re observing a convergence to the same set of core challenges across very different contexts.

Although the commercial market is still early in putting these principles into practice, we’re seeing promising trends across a number of companies and categories, including energy, industrials, water treatment, chemicals, mining, and defense. In particular, LLMs combined with deep reinforcement learning have emerged as a critical unlock to solving these challenges in rapidly changing, complex environments where there are too many variables to enumerate deterministically.

The way traditional autonomous systems are designed, using control theory, is that you take the full dynamics model of a system and find an equilibrium based on an objective function. You then apply forces to the system to steer it toward those. When AI enters the picture, you don’t assume to have access to the underlying model; instead you have historical states and actions over time, and the system learns an optimal policy. This works particularly well when there are too many variables to capture in a single model, or when it’s not possible to model how different variables interact with each other — which is the case for almost all complex, real-world autonomous systems operating at scale.

Controls

The final — and arguably most critical — part of the autonomy software stack is the control system, which is what actually allows an agent to interact with the real world. Whether it’s manipulation, like a robot picking things out of a bin to pack a box, or a valve that controls the flow of reactant in a chemical process, this is the piece of the stack that is usually the least standardized today. In many use cases, it will remain the most bespoke for the foreseeable future.

The sophistication and standardization of the control stack is partly driven by hardware replacement cycles in any given industry. In industries where there is a high turnover of hardware — military drones, for instance — we see new hardware systems being developed with autonomy in mind as a first-order use case. We also see a somewhat standard set of hardware components (e.g., motors and flight controllers for drones), so integrating with a new model is a relatively straightforward exercise. A similar pattern is emerging with many robotics use cases.

However, in many industries — think large-scale HVAC systems, chemical plants, heavy machinery, deepwater oil wells, etc. — hardware represents massive capex and replacement cycles are much longer. In these cases, autonomy companies have to get more creative about how they integrate with the control system, and they often experience a significant “controls gap.” In other words, there is a big difference in what the human operator thinks they can control and what the system is actually configured to do. This has its pros and cons: It can take a lot of time and effort to get these systems up and running, but they are often extremely sticky once they are.

The last step of handing over the keys to an autonomous system — establishing trust with human operators — is often the most challenging part, and it’s more of a product problem than a technical one. Whenever you have AI systems making decisions — whether that’s what route to take to a destination, what controls settings to adjust, or how to plan a tactical mission — it’s understandably very tricky for humans to give up control. Hence, almost every autonomy company we meet must develop some notion of a human-in-the-loop or read-only version of the system to help operators understand the decisions an autonomous system would make under a given set of real-world conditions.

Often, an autonomous system will not do things exactly the way a human might, because the system is able to hold state across many more different variables at once than a human can. On the other hand, autonomous systems occasionally develop emergent behavior that closely mimics how the most high-skilled operators instinctively work, but might struggle to explain or teach, because it’s based on years, or decades, of learned intuition. Therefore, developing a clear understanding of how to benchmark a system’s performance from the outset, and offering in-depth analytics, is critical to gaining trust. It’s definitely a product problem that autonomy founders need to prioritize early in development.

Let’s talk

Although we are starting to see common patterns emerge in how companies across different industries are approaching autonomy — especially as building blocks like sensor technology and AI models have matured — these are still complex challenges. True autonomy spans software and hardware, and we are in the early innings of designing and building consistent, reusable infrastructure to enable it at scale. If you’re working on infrastructure at any layer of the autonomy stack, or building a full-stack autonomy product to serve industrial use cases, we’d love to talk!

Contributor

Erin Price-Wright is a general partner at Andreessen Horowitz where she invests in companies that promote American Dynamism, with a focus on AI for the physical world.
- Follow
- X
- Linkedin

More From this Contributor

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

RECOMMENDED FOR YOU

go to top