AI Revolution

Leveling Up

David Baszucki and Jonathan Lai

Back to AI Revolution

This conversation is part of our AI Revolution series, which features some of the most impactful builders in the field of AI discussing and debating where we are, where we’re going, and the big open questions in AI. Find more content from our AI Revolution series on www.a16z.com/AIRevolution.

Roblox cofounder and CEO David Baszucki and a16z’s Jonathan Lai look discuss the ways AI is changing games and virtual worlds and how we create them.

  • [00:40] Generative AI in games
  • [04:37] Fully personalized playthroughs
  • [07:30] Roblox’s generative model
  • [11:08] Future of interfaces
  • [13:05] Impact on creators
  • [16:16] NPCs as copilots
  • [18:29] VR & spatial computing

Generative AI in games

Jon: We just had a session talking about how generative AI is potentially transforming all types of media. Tell us a bit about how AI is transforming the way folks are building games on Roblox.

Dave: We expansively go beyond gaming and we think about the ultimate form of the way humans communicate at a distance. What we see a lot on our platform, in addition to playing, is how do kids connect with their friends in the middle of COVID? How do people go to a concert with their friends when they can’t go together in person? How do we, in our own office, simulate serendipity when some of us are remote? We have a simulation for the Roblox office that we’re playing around with inside of Roblox.

If we take a step back and say immersive 3D will, with market forces, follow the phone system, video communication, and SMS, then the influences are just massive. I think there are 3 categories. There is one category where people on our platform don’t even think of it as AI, even though it’s been going on for 2 or 3 or 4 years, which is quality of personalized discovery, quality of safety and civility, voice and text monitoring, asset monitoring, and quality of real-time natural translation. How good is our translation versus others—that’s the one that people don’t notice.

The next one that is really exciting right now is generative. Code generative, 3D object generative, avatar generative, game generative, which is very interesting. The future one, which is really exciting, is how far do we get to a virtual doppelganger or a general intelligence agent inside of a virtual environment that’s very easy to create by a user? You want George Washington in your 12-year-old’s school project? I’m not on Tinder, but if someday Tinder has a Roblox app, can I send my virtual doppelganger for the first 3D meeting? I think going all the way from the things we don’t notice to the things that are exciting around generative, to future then general intelligence, these are all going to change the way this works.

Jon: When you think about the parts that go into building the game? There are just so many pieces—concepting, the storyboarding, writing, the creation of the 2D images, the 3D assets, and there’s the code and the physics engine. Roblox has built many of these pieces into its own studio and its platform. What parts do you think will be most affected by this new generation of generative models that you just spoke about?

Dave: It’s almost worth saying the antithesis, which is what will not be affected. Ultimately, there will be acceleration on all of these. We have a bit of an optimistic viewpoint right now because of the 65M people on Roblox, most of them are not creating at the level they would want to. For a long time, we imagined a simulation of Project Runway where, in the early days of Roblox, Project Runway is just skeuomorphic. You have sewing machines and fabrics and it’s all 3D simulated.

But when we think about it, even that’s kind of complex for most of us. I think now when Project Runway shows up on Roblox, it will be text prompt, image prompt, voice prompt, whatever you want as if you’re sitting there. If I was helping you make that, I’d say, “I want a blue denim shirt, I want some cool things, I want some buttons, make it a little more trim, fitted.” I actually think we’re going to see an acceleration of creation.

Fully personalized playthroughs

Jon: One of the things you might have mentioned earlier in a podcast is the idea that people can create experiences inside of other experiences. Even when they’re inside the game they have a fully personalized avatar or cosmetic.

Dave: That’s right. The classic 3D creation uses these heavyweight blenders, 3D Studio or Maya. Roblox Studio uses these things. For a while, at Roblox, we were thinking, “How do we make Roblox Studio easier to use? Two million people use it.” But there is another level above that. We saw early signs of this as our users tried to build “Tycoon Builder” and “Roller Coaster Builder.”

The more access they have to the 3D objects and the more persistence, all of a sudden, you get better creation. As AI starts being a service available within any experience, we would like Project Runway to make the ultimate Project Runway experience. Dynamic, persistent 3D clothing, not in Maya or 3D Studio, but in the experience itself, accelerated by cloud AI tools. Then there’s so much leverage. We’re not making Roblox Studio better. We’re providing a lot of facilities for them in that very vertical integration experience, and we would see the same for cars and and all kinds of other things.

Jon: One of the things you’re getting at here is just the power of the concept of a personalized playthrough. The idea is that maybe every user on Roblox in the future has a fully unique experience that’s tailored to their preferences and the choices they made.

Dave: I think you’re touching on an even further extension, which takes a lot of compute horsepower. It’s completely personalized generation in real-time backed by massive inference. You can imagine, “I’m making the super Dungeons and Dragons thing,” but as it watches you play and knows your history, you’ll be playing a 3D experience that’s no one’s ever seen before. I think we will see, in certain types of play, real-time procedural where you need enough inference in real time to generate what you would like.

One of the good things we’ve done, and I can see some Roblox people in the audience here, is for a long time, we’ve really focused on building our own infrastructure. We have hundreds of thousands of servers, we have many, many edge data centers, and terabytes of connectivity that we’ve traditionally used for 3D simulation. The more we can run inference jobs on these, the more we can run super high volume inference that’s high quality at a low cost—and make this freely available so the creators don’t worry about it.

Roblox’s generative model

Jon: That makes a lot of sense. On the topic of infrastructure, in your latest shareholder letter, you mentioned that Roblox is working on your own multimodal generative model that will encompass many of the things that we just chatted about. Can you tell us a bit more about what you’re building there?

Dave: In any company like Roblox, there’s probably 20 or 30 end-user vertical applications that are very bespoke. Natural language filtering is very different than generative 3D. At the end user point, we want all of those running. We want to use all of the data in an opt-in fashion to help make these better. As we go down, there’s probably a natural 2 or 3 clustering of general bigger, fatter type models in a company like ours. There’s definitely one around safety and civility, natural language processing, natural language translation, and generally, one more multimodal thing around 3D creation. Just like in Project Runway, some combination of text, image, and generated avatar.

There’s probably a third area, which gets into the virtual human area, which is how would we take the 5B hours of human opted-in data: what we’re saying, how we’re moving, where we go together, how we work in a 3D environment. Could we use that to inform a better 3D simulation of a human? I would say yes, looking at large models in those 3 areas. Then there’s going to be these super big massive LLM-type companies. I think we are probably a layer below that. We’re very fine-tuned for the disciplines we want with the ability to train and run massive inferences for these.

Jon: There’s a massive proprietary data set that you guys are working off of. All of the 3D assets that have been created in Roblox over the years…

Dave: There’s a lot of data in any kind of 3D immersive environment. The communication from people can inform safety and civility, 3D, motion eye tracking, and how humans interact. One area we’re really watching that’s a very difficult problem right now is true high-quality 3D generation as opposed to 2D generation. There’s lots of wonderful 2D generation stuff out there. We really double down on 3D generation.

Jon: How do you think about which parts of the tech stack where you want to partner with external partners versus building yourself in-house, with all of the data and the compute resources that you have available?

Dave: Areas that we’re interested in are: there are companies exploring the future of training cheaply at mega-volume. What’s the future of running inference cheaply at mega-volume? What types of technology abstract away different hardware devices? How can you run a mix CPU/GPU environment over time? We’re very interested in that. We’re watching those types of tech stacks a lot.

Future of interfaces

Jon: Do you think you’ll need to have a new user interface or discovery mechanism just as the sheer quantity of games and experiences, if we’re moving towards the metaverse?

Dave: Whether it’s Amazon or Netflix or TikTok or YouTube, I think for us, the difficult discovery problem is you’re trying to optimize 10-year enterprise value along with your stock price every year that you get there. A lot of short-term discovery mechanisms do not optimize long-term value if some amazing creation gets suppressed, where it could have added 5% to the platform.

There are a lot of opportunities in discovery and it’s not a solved problem, although many companies are solving it in really interesting ways right now. I think there are ways the user interface can change as well. There’s a lot of opportunity in addition to thinking of this just as content and as your real-time social graph and thinking through what other people are doing.

Jon: It’s fascinating because one of the examples of AI being used by big companies that my team just talked about was Netflix, TikTok as well, with their very personalized feeds and recommendations. You could imagine a future where a user that onboards into Roblox doesn’t actually see a library or a catalog of games, but it’s just presented with a feed and it’s almost like you’re just going from one end to another…

Dave: This is really right. We are constantly testing the new user experience. Should that be 2D? Should that be 3D? What’s the weighting between creating your digital identity versus discovery? What’s the weighting between connecting with your friends and optimizing all that? We may find that has to be personalized.

Impact on creators

Jon: I want to shift briefly to the player perspective. Now that you have a couple of AI tools, things like the code assist and the material generator, what has been the reception from the creators or the players that have encountered this? Any feedback?

Dave: Whenever we give more capabilities to the user or the player base, we have historically seen a lot of acceleration, even if it’s very early. I think it’s still super early. The things we have launched right now are just a very light layer. Code generation and material generation, super exciting. Potentially prompt-driven avatar creation, even more exciting. Potentially, general-purpose 3D asset creation, even more exciting.

Jon: Do you think the types of games that leverage these sorts of technologies might change themselves in terms of the gameplay mechanics, like the genres that you might see emerge?

Dave: Yes. I’m optimistic that more people will have control of in-experience creation. For example, going back to the Project Runway or Karlie Kloss-type example, experiences where creation amongst 65M to 70M people as just part of the way it goes have not been possible. But, an experience where there’s millions of people acting as fashion designers and voting and picking who’s got the best stuff, then possibly imagining some of that going off and being produced in real life, or some of them being plucked up by Parsons and saying, “The future designer.”

Imagine there’s a racecar driver that got picked because they were so good at a simulator, or other genres like this, fashion design, other types of creation, where you actually create on the platform and then get identified as a future star.

Jon: I’m just really excited about the notion of taking the creativity that you find in kids doing imaginative play and then turbocharging that with AI tools. What if this would happen? What if there was a natural disaster in the middle of the room? Right now, it’s very difficult for those creators to make those experiences. But with a text prompt, you could imagine that there’s games that just materialize and just the diversity and the wildness of those experiences could be…

Dave: I think having a text prompt that’s generative not at the developer level building the big heavy game, but having text or voice prompt as just something that’s naturally part of any experience wherever you go. Rather than the traditional ways with either voxels or building things in 3D experiences, it just becomes natural to use a text prompt just like in a traditional avatar editor. Rather than sliders and radio buttons, that will move to a more interactive type text prompt.

NPCs as copilots

Jon: Something we’ve been excited about is the notion of NPCs as teachers and co-pilots. Let’s say the first time that you’re onboarding the Roblox, you could have a guide that basically shows you where to go next. Then if you’re interested in building it helps you lay your first few experiences together, and so on. That feels like it’s potentially very powerful as the concept of NPCs as copilots instead of co-players.

Dave: I think there’s a lot of interesting stuff going on in various ecosystems around this copilot notion. There’s one copilot where we’re all wearing our little earbuds all day long and that copilot is talking to us. That’s maybe more consumer real-time copilot. But there’s obviously many companies trying to build a copilot that you hook up to your email, your texts, your Slack, your web browser, and whatever, and it starts acting for you. I’m really interested in the notion that copilots will talk to other copilots using natural English, which I think will be the universal interface of copilots.

You could imagine NPCs being created by prompts. “Hey, I’m building a historical constitutional thing. I want George Washington there, but I want George Washington to act at the highest level of civility and usher new users through the experience, tell them a little about constitutional history, and go away when they’re done.” I actually do think you will see those kind of assistance.

Jon: That’s an incredible example because what better way to learn history than actually talking to the folks who were a part of that historical scene or time period? You could actually see how this advances the educational…

Dave: We’ve been talking a lot about the communication aspect of going to Ancient Rome with your friends. You’ll go to ancient Rome and most of the people you’ll interact with will be historically, as good as we can create, representations to enrich that experience. It’ll be pretty fascinating.

VR & spatial computing

Jon: That’s pretty powerful. A couple of weeks ago, you tweeted that the Roblox app on Meta Quest had actually hit 1M downloads in just the first 5 days in its beta form. It wasn’t even out on the actual Oculus Store. What are your thoughts on VR and spatial computing?

Dave: Our thesis has been that, when the iPhone shipped and all of a sudden, we had 2D HTML consumable on a small screen rather than a large screen with the pinch and zoom—and now we take it for granted. My kids probably don’t realize there was some cheesy mobile web thing 10 years ago pre-iPhone, where browsers were large-screen things. Now we just assume 2D HMTL is everywhere. I think 3D is the same. It’s immersive multiplayer in the cloud, simulated 3D. Because of that, every device is optimal for the device camera and device user interaction and has different levels of immersiveness. Your phone is not as immersive as your VR headset, but your phone is more spontaneous.

We think the market ultimately figures out which device you consume this with. We’re not trying to move the market, but we do want to be in every place so we can watch the market move with our players.

I would say one of the historical things we’ve done is all of our 3D engineers are working on crappy Android devices as their primary test device, which helped with Oculus. I’m not saying it’s a crappy Android device, but I’m saying it’s rendering 2 eyes rather than 1, which is technically challenging. Having a foundation of we’re working at high performance rather than harvesting 16 cores helped us to do that.

Jon: It’s exciting to think about potentially different types of experiences being optimized for different devices. Maybe you play the Tycoon game on your PC where you have a mouse and keyboard and then you play a first-person shooter in the VR headset. You’re picking the right device.

Dave: That’s exactly right. It’s impossible to abstract all the way across hand controllers down to a phone.
Jon: Make sense. So, a quick lightning round of fun questions to close. What is your favorite book?

Dave: “Infinite Game.”

Jon: What is your favorite game?

Dave: Chess.

Jon: What is the best advice that you have received as a young person from a mentor or a friend or someone?

Dave: I feel I didn’t get much good advice when I was young, actually. The worst advice I got from my dad was, “You should go into accounting and business.” My general advice to everyone would be: I thought my first 2 jobs were a disaster, but life is long. If you find something you’re really good at and you love at the same time, that’s obviously the optimal Venn diagram. It’s not to follow your passion; it’s not follow what you’re good at. If you can you overlap those Venn diagrams, that is the sweet spot.

Jon: What is one wild prediction for how you think the world will evolve over the next 5 years?

Dave: I have a Turing test question for AI. If we took AI in 1633 and trained on all the available information at that time, would it predict the earth or the sun is the center of the solar system even though 99.9% of the information is saying the earth is the center of the solar system? I think 5 years is right at the fringe, but if we were to run that AI Turing test, it might say the sun.

Jon: Do you have a different answer if it was 10 years?

Dave: 10 years, I think it’ll say the sun.

Jon: Awesome. Thank you, Dave. This has been a pleasure.