Language models changed how we interact with computers because they enabled software to speak and understand natural languages. In fact, what may be most remarkable about the language model revolution is how general the solution is. In just a few years, this has changed how we think about and use computers. They now fill many aspects of our lives they were unable to before — as conversational partners, coding helpers, sources of creative composition, teaching aids, ideation buddies, and so much more.
But language is just one way we humans reason and communicate. From the moment we’re born, the way we understand the physical world is spatial. Long before language develops, we first understand how the world works and learn to reason by seeing and interacting with it. We learn to think and communicate with pictures and gestures. Civilizations are built around spatial understanding, from urban planning to scientific discovery of fundamental structures such as DNA.
Given this, it’s no surprise that alongside language, there has also been a “pixel” revolution in generative AI, allowing us to create images and videos from text descriptions of what we want to see. Unlike language models, these generated images are reflections of the physical world. But while these sorts of models are tremendously useful for creating images and videos, none are quite as general as LLMs are for language. For example, none can create a true 3D representation of a world you can walk through, interact with, and build upon. Something that isn’t just confined to a screen, but is an actual representation of the real world (or perhaps a generated virtual world), that sits within a computer and allows for reasoning about physics and cause-and-effect.
One of the reasons such a world model has not been built is that doing so is really, really hard. It requires overcoming key problems in adjacent but disparate areas such as data, graphics, and AI. And it is this problem that brings us to World Labs.
We’ve known World Labs cofounder Fei-Fei Li for nearly two decades. She was joining Stanford as a new professor while Martin was just leaving to do his own startup, and she has advised the firm on AI many times in the past. Among her great body of work, she is known for a key contribution in computer vision that unlocked the field and resulted in a paradigm shift for training and benchmarking AI models. So when she approached us last year saying now was the time to solve the world-model problem, we were hooked!
Most critically, Fei-Fei, now known widely as the “godmother of AI,” had put together a founding team that was exceptionally positioned to crack the spatial intelligence problem. Of course, Fei-Fei has spent her career as a pioneer in this space. And one of the key problems in building 3D world models is data, for which she’s inarguably one the world’s foremost experts.
Alongside her are giants in their respective fields that are critical to cracking this problem:
The founders are joined with an equally formidable team of world-class engineers and researchers across infrastructure, graphics, computer vision, generative models, and systems.
So when World Labs asked if we would like to be involved, we were thrilled to partner with this world-class team. So much so that Martin has taken up part-time residence in the World Labs offices, learning from an elite AI/graphics team and helping the company toe the line between deep research and product — something he had to do when he stepped out of his PhD and into a company.
At this point, the company is deep at work pushing the frontiers of the spatial intelligence problem. They’re building a foundation model that can generate 3D interactive worlds, which are the basis for spatial reasoning. There are immediate applications for such a model, ranging from design to gaming and from visual effects to AR/VR. And, in the longer term, robotics, as the hardware catches up to software!
We’re so very honored to be involved in this, and deeply believe that what this team will produce will have as deep an impact as the LLM revolution. And perhaps worlds more 🙂