While large language models have captivated the public mind recently, another equally exciting domain has been steadily progressing in parallel: visual generative models. These are models that can understand, reason about, and, most importantly, create worlds and be used for all kinds of generative AI applications — including photorealistic video generation, gaming, interactive 3D worldbuilding, and robotics.
It was actually success in computer vision, not language, that kicked off the deep learning revolution circa 2012 with relatively simple image classification. In 2022, 2D image generation had its magical moment, and today we have popular consumer products such as Ideogram, DALL-E 3, and Midjourney. And while these 2D products and models continue to improve and add new capabilities, other areas have been progressing, as well. Specifically, generative 3D models are on the cusp of their own breakthrough into the mainstream.
The first signs of serious possibility in 3D came in 2022, with papers such as Get3D, PointE, and DreamFusion. This led to a swell of work in the field last year, with more than 20 papers on the topic submitted in September 2023 to a single conference (ICLR). Researchers have finally cracked some of the main challenges around fidelity of generations, such as the infamous “Janus problem,” although often with methods requiring up to an hour for a single generation. The name of the game now is increasing the speed of generations while improving visual quality, and going beyond generating single objects to generating worlds.
This is where Luma AI shines — building fast, high-quality, and deployable foundation models. Their first research preview of a generative 3D model (released in November), enabled anyone to generate interactive 3D assets based on any text prompt they imagine (e.g., “Gold ring with ruby Gemstone”), and set a high standard for fast, quality text-to-3D generation. It had tens of thousands of users on Day 1 alone. Today, Luma is making Genie generally available for anybody to access and use.
We place particular emphasis on teams when investing in early-stage companies, especially when those companies are translating research into real-life applications. The Luma team exemplifies what we’re looking for. Before ultimately dropping out of a PhD program to start Luma, Cofounder and CTO Alex Yu authored pioneering papers on real-time neural rendering of 3D scenes and generation of 3d from a single image. Complementing him on the technical team are Chief Scientist Jiaming Song, who led work on diffusion models (e.g., DDIM) that significantly improved state-of-the-art performance, and Matthew Tancik, who helped create Neural Radiance Fields, one of the most impactful new approaches to 3D neural rendering. Bringing the team together is Cofounder and CEO Amit Jain, whose work on the Apple Vision Pro and Apple’s computer vision team presaged Luma’s maniacal focus on delightful product design and engineering.
We’re excited to lead Luma AI’s Series B round and to work even closer with this world-class team. And as Luma scales up Genie, and builds the next generation of multimodal models, it’s looking for generative AI researchers, and software engineers with skills in training large models, working with vast data, infrastructure, web graphics, and design to join the team.