AI is finally beginning to fulfill its massive transformative potential, as evidenced by the spate of new AI-enabled products across text, images, video, audio, and more. But as far as production-ready products go, one modality has thus far been notably less present from this ongoing Cambrian explosion of AI: physical actions.
The types of physical actions generally performed by robots have largely been trapped within the confines of Moravec’s paradox and have seen nowhere near the pace of advancement as in other modalities. There are, of course, understandable reasons for this, ranging from the difficult unit economics of physical automation solutions to the challenges of delivering on correctness for the long tail of physical tasks.
Over the last two years, however, there has been a meaningful acceleration in the talent, capital, and research progress in the robotics domain. Overlapping bodies of research appear to be heading in the direction of more general robots — toward the promise of generalist embodied AI agents. This research progress includes, among other things, the pursuit of scaling-laws hypotheses for robotics and the emergence of vision-language models applied to robot actions; advances in methods around co-training and cross-embodiment that increase the leverage of robotics data; and progress toward bridging neural nets to low-level controls for truly end-to-end robot learning.
Moreover, leading researchers in the field are spinning up commercial research efforts and new companies, and billions of dollars in capital have been allocated towards robotics startups in 2024 alone. The confluence of talent, capital, and technology in the field suggests we are in the midst of a robotics and embodied AI upswing that could eventually enable the development of a horizontal robotics platform, thus giving more developers the opportunity to innovate in this field.
One way to think about robots is that they are computers that, to date, have lacked maturity compared to other computing platforms. They largely have yet to develop the depth in operating systems, developer platforms, and other components of the ecosystem that help accelerate developer activity.
Historically, robotics companies have been built in a somewhat siloed fashion. They typically have defined a constrained automation problem and built a tailored solution for that automation problem from the ground up. Companies often collected their own data, configured their own hardware, and built their own intelligence layer, as well. While there exists middleware, like ROS (Robotic Operating System), and a standard set of hardware OEMs that provide commercial, off-the-shelf hardware for robotics solution providers to build on, these products don’t really constitute a robust, horizontal platform. Instead, they function more as discrete developer tools and components of a robotics solution (for instance, despite its name, ROS is not actually an operating system under any strict definition).
The issue with this siloed approach to development is that, while tools and infrastructure for robotics developers exist, there has been less of an apps > infrastructure > apps > infrastructure cycle than in other categories of computing. These cycles accelerate the building of key components that can constitute a platform upon which developers can build a diverse set of applications. A move towards generality, where there is a horizontal base for robotic intelligence, can enable a common foundation to accelerate this developer ecosystem.
At the core the current general robotics wave is the hypothesis that the scaling laws observed in other AI modalities will apply to robotic actions. The idea is that the bitter lesson extends to robotics — that advances in AI are a function of scaling data and compute. Rather than having specific models of robotic intelligence for specific use cases, we could have robotic foundation models that extend across environments and tasks. There is some debate as to whether scale can “solve” robotics (see here for a breakdown of arguments for and against), but the general direction of research — especially among newer commercial research teams — is towards scaling robot data as a means of building a large base model for robot actions (what is typically meant by “foundation model for robotics”).
If data scaling laws do hold, this move toward generality may constitute the beginnings of a cohesive robotics platform. Widely available, and horizontal, robotic hardware platforms and intelligence layers can drive down the cost of developing robotics applications. We could finally have a unified platform for computers to have read/write access to the physical world, which is currently scattered across various sets of capabilities for vision, sensing, manipulation, and locomotion.
Naturally, progress in AI and robotics has raised the prospect of a “ChatGPT moment” for robotics — an inflection point when the technology experiences a mass-market product breakthrough. The physical world, however, is highly variant — much more so than virtual or human-created domains — and involves an enormous amount of parameters that may make a singular product solution difficult. As such, it is possible that the breakthrough robotics moment will look less like a single consumer-grade product and more like a common operating system enabling an ecosystem of devices, developer tools, and applications — more like Android than like an iPhone.
We think a functional market structure for a new robotics will look something like this:
However, any meaningful discussion of commercial robotics needs to address the elephant in the room: As a category, the commercial outcomes for robotics companies have not been good, and founders, technologists, and investors in the field have a fair amount of scar tissue.
Commercially successful outcomes such as Kiva Systems were few and far between over the last two decades, and even those were typically moderate when compared to successes in other contemporaneous markets. Failure was much more common. Some common challenges historically faced by robotics companies include:
These challenges are surmountable, though, and — as noted above — there are reasons to be optimistic about current developments in the space.
One driving force for optimism is a new infusion of talent into the commercial robotics category. Over the last 2 years, the leading researchers in the field have moved from academic and large commercial labs toward starting their own companies. The authors of many leading robotics papers are now in-house at places like Physical Intelligence, Skild, and a number of other newly formed companies aiming to build the robotic intelligence layer. Additionally, top AI researchers have moved into a number of companies building full-stack robotic applications, such as 1X, Figure, and more.
We are also seeing the emergence of more teams focused on robotics efforts inside other AI and developer-facing companies, such as Nvidia’s GR00T or Hugging Face’s robotics team.
Another reason for cautious optimism is a surge in capital being directed toward horizontal robotics efforts over the last year. While capital surges alone certainly do not guarantee the success of this robotics wave, it is a necessary component given the capital-intensive nature of current horizontal robotics efforts that form the foundation of a horizontal robotics platform. Moreover, a long-term view could characterize such capital patterns as the early-mid stages of a Perezian financial cycle with varying relationships between financial and production capital, but ultimately leading to a technology revolution in robotics – one where in the long term, the promise of general, intelligent robots is eventually realized.
As one might expect, however, research rules the day. A number of related areas of research have made significant progress over the last 2 years and suggest new open questions and avenues of research:
The use of large language models for robotic task planning; vision-language models for reasoning around vision inputs and perception; and related applications of language models to embodied intelligence represent an opportunity to greatly improve the reasoning capabilities of robots, particularly when applying multimodal models to reasoning about physical environments and task planning. Moreover, spatial and world models represent another emerging category of large models that could greatly improve the physical reasoning abilities of robots.
Under the hypothesis around the applicability of scaling laws to robotics, this area of work addresses what is perhaps the key bottleneck in the field. In contrast with available data sources for text, video, and images, there is no such thing as internet-scale robotics data. However, because one of the core goals of this current robotics wave is to solve general robotics via scaling, unlocking different methods of scaling robot data is a key research area. These methods include improved teleoperation and human-behavior cloning, simulation-based methods, learning from video data (especially egocentric video), and hardware setups that enable low-cost robot data collection (sometimes without robots at all). In all likelihood, we will need data collected via multiple, if not all, of these methods.
While language models can provide greater capabilities for high-level reasoning and planning, the promise of a robotic foundation model is that this intelligence will extend to the low-level robotic controls, as well. Over the last decade or so, neural networks have taken on more and more of the robotics stack, from perception through task- and motion-planning. Bridging to the low level, such that robots are able to rely on neural networks end-to-end (i.e., from sensor input through to action output), significantly increases the capabilities of robots towards being more intelligent and general, and remains a key area of work in order to fully realize the potential of robotics foundation models.
Some research around cross-embodiment learning has shown indications of robot policies that transfer across different embodiments, or the use of data collected from one embodiment for robots of a different embodiment. This research plays an important role in the emerging robotics platform for two main reasons. First, it improves the efficiency of scaling robot data — if data collected on one embodiment can be applicable to robots of a different embodiment, it may make it more achievable to reach the required amounts of data to start seeing the results of scaling. Second, given the diversity of environments and tasks in the physical world, it is likely that there is not one optimal universal embodiment applicable across every environment-task combination. As such, robotic intelligence that is able to generalize across embodiments can greatly improve the horizontal nature and capabilities of the robotics platform, as opposed to requiring different models for different embodiments, or only being able to use one general embodiment as the universal hardware platform.
Advances like reductions in sensor costs, improvements in dexterous robot hands, novel custom actuators, and new humanoid robot efforts all contribute to the improvement of hardware capabilities of the robotics platform. In general, the more dexterous and mobile a robot is the less it’s limited by hardware constraints. However, hardware advances require not just an improvement in capabilities, but also an improvement in the economics and manufacturability of these systems. These advances all contribute to the improvement of hardware capabilities of the robotics platform.
Given the activity in this robotics cycle, we think there are a number of interesting opportunities in which entrepreneurs can build important pieces of the emerging robotics platform. Here’s a non-exhaustive list:
Robotics is certainly a difficult category, and many teams have been thwarted by the inherent difficulties of building for the physical world. However, the emerging robotics wave suggests a number of reasons for optimism. We are seeing multiple trends across talent, capital, and research come together to form the beginnings of an emerging robotics platform.
As we embark on this new robotics cycle, it remains to be seen whether the development arc of robotics and embodied AI ecosystem looks more like autonomous vehicles (highly siloed and centralized development over long time horizons) or language models (a variety of competing base models that support an ecosystem of decentralized developer activity). We believe the latter can lead to the emergence of a robust robotics platform that kickstarts a flywheel in the category.
If you’re building a piece of the emerging robotics platform, please reach out.