AI Revolution

Universally Accessible Intelligence

Noam Shazeer and Sarah Wang

This conversation is part of our AI Revolution series, which features some of the most impactful builders in the field of AI discussing and debating where we are, where we’re going, and the big open questions in AI. Find more content from our AI Revolution series on www.a16z.com/AIRevolution.

Noam Shazeer, Character.ai CEO and cofounder, talks to a16z’s Sarah Wang about the dawn of universally accessible intelligence, the compute it will take to power it, and his pursuit of AGI’s first use case: AI friends.

  • [00:39] Real Noam vs. AI Noam
  • [05:17] Next unlocks & scaling laws
  • [07:13] AGI’s first use case
  • [10:08] Building a full-stack LLM
  • [12:35] Noam calculates global compute capacity
  • [14:26] Universally accessible intelligence

Real Noam vs. AI Noam

Sarah: Noam, earlier today, David Baszucki at Roblox said that, in the future, it would be very difficult to parse between a real person and their AI version. So to kick off this session, we thought it would be fun to interview you and the Character.AI version of yourself to see how you both answer the same question. This is live, so we don’t know how AI Noam will be answering. Are you ready?

Noam: Sounds good.

Sarah: I am going to start with a softball personal question. I know that you went to Duke for undergrad. Can you tell us a little bit about why you decided to go to Duke?

Noam: I went on a basketball…no, I’m joking. They recruited me because they wanted to win the Putnam math competition, so I ended up going on scholarship.

Sarah: Great. Let’s get a little more career-related. You were at Google for almost 20 years. Why did you decide to leave?

Noam: I’ve always been excited about AI and I just wanted to push technology forward. I saw this large language model thing…well, at the time, it wasn’t large language models, but neural networks. Language modeling was just the most promising thing on earth. It’s the best problem ever. All you have to do is predict the next word and, once you do it, you get something that can just talk to you about anything.

I know that we can make this technology better in a lot of ways. We can improve it with model architecture, distributed algorithms, quantization, and all of these things.

I was working on that, but then it struck me that the biggest thing is just scale. Can you throw $1B or $1T at this thing instead of $1M? What you need for that is just to show a massively valuable application. There are a few options, coding looks super useful, but just also dialogue—it’s the world’s number one pastime.

Back in 2019 or 2020, my cofounder, Daniel DeFreitas—he’s been on this lifelong mission to do chatbots and dialogue systems—had built the system. He called me in and it was like this 20% project and it was really good. So then it was like, “We have to help Daniel, let’s build something amazing.” We got it on giant models and it was this internal viral sensation at Google.

At some point we realized there’s just too much brand risk in large companies to ever launch anything fun. Let’s do a startup, and let’s maximally accelerate.

Sarah: I would guess that that answer is also probably true. By the way, you’re the third person, in addition to Mira at OpenAI and Dario at Anthropic, who’s also said you haven’t seen any limit to the scaling laws. We’ll get more into that later.

We’re going to actually look toward the future now. This is our last question for AI Noam as well. Noam, are you afraid of AGI destroying the world?

Noam: Well, I think we just need a global pause of about 4 months until we get enough H100s online to train our model.

Sarah: I’ll take that as a no.

Noam: No. Not yet. I think there’s a lot of potential benefits and yeah. We’re going to work on it as the technology improves.

Sarah: Noam, I don’t know if you’ve got to read all of AI Noam’s answers, but how did AI Noam do? How would you score his answers?

Noam: Oh, that’s pretty good. Yeah, that’s better than I would do.

Next unlocks & scaling laws

Sarah: We talk about getting better, but what does better mean? In some cases, it’s correctness. But for Character.AI, it’s not always about correctness. How do you see AI Noam getting better? What does better mean?

Noam: Some of the big unlocks we’re working on are to just train a bigger, smarter model. The scaling laws are going to take us a pretty long way. The model we’re serving now cost us about $2M worth of compute cycles to train last year. We could probably repeat it for $500K now.

We’re going to launch something tens of IQ points smarter, hopefully, by the end of the year. Smarter, and more accessible, meaning multimodal. Maybe you want to hear a voice and see a face and also just be able to interact with multiple people. When a virtual person is in there with all your friends, it’s like you got elected president. You get the earpiece and you get the whole cabinet of friends or advisers. Or, it’s like you walk into Cheers and everyone knows your name, and they’re glad you came.

There’s a lot we can do to make things more usable. Right now, the thing we’re serving is using a context window of a few thousand tokens, which means your lifelong friend remembers what happened for the last half hour. Still, there are a lot of people who are using it for hours a day. That will make things way better, especially if you can just dump in massive amounts of information. It should be able to know like a billion things about you. The HBM bandwidth is there. It just needs to do it.

AGI’s first use case

Sarah: Let’s talk a little bit more about Character.AI, explicitly. Since launch, you’ve seen more than 20B human messages sent on the platform. Even though you now have millions of daily active users, they’re still, on average, spending 2 hours daily on the platform. Is that right?

Noam: I think the way to understand it is this: entertainment is like a $2T a year industry. The dirty secret is that entertainment is imaginary friends that don’t know you exist. The reason people interact with TV or any of these other things—it’s called parasocial relationships, which is your relationship with TV characters, or book characters, or celebrities—and everybody does it. There are billions of lonely people out here. It’s actually a very, very cool problem. It’s a cool first use case for AGI.

There was the option to go into lots of different applications, and a lot of them have a lot of overhead and requirements. If you want to launch something that’s a doctor, it’s going to be a lot slower because you want to be really, really, really careful about not providing false information. But friends you can do really fast. It’s just entertainment, it makes things up. That’s a feature.

Essentially, it’s this massive unmet need. It’s very important that the thing kind of feels human and is able to talk about anything. That matches up very well with the generality of large language models. One thing that’s not a problem is making stuff up. I want to push this technology ahead fast because it’s ready for an explosion right now, not in 5 years when we solve all the problems.

Sarah: Absolutely. That’s a big contrast with self-driving cars. That’s a different standard versus your AI friend.

Noam: Your AI friend, or something you view as an AI character or AI entertainment. What standard do you hold a comic book you’re reading to?

People like that human experience of very mixed use cases that talk about everything. It’s not that we want to fine-tune to some particular domain or some particular use case. People want this experience of everything, which is fine. It’s what the technology is perfect for.

Building a full-stack LLM

Sarah: From the a16z vantage point, we have seen startups come up and say, “Hey, I’m going to tackle the mental health use case,” or “I’m going to tackle the Ed Tech use case.” They go much more narrow than Character.AI. They go after a specific use case and say, “We’re going to train this model to be focused on that. It’s going be better than a generalized model.”

I think you got into this a little bit with the mixed use cases, but can you share more about why you decided not to take that approach? Why do you think having a single model served across a number of use cases is the best approach?

Noam: The more you get to a particular use case, the more you get tempted into writing particular rules and doing things that will not generalize well. It was important to stay away from that.

Our goal is to be an AGI company and a product-first company. The way to do that is by picking the right product that forces us to work on the right things: things that generalize, make the model smarter, make it what people want, and serve it at massive scale and serve it cheaply. I think this was the right product for the right goal.

Sarah: You’ve also chosen this approach when building, we call it a vertically integrated model and app company. There are advancements in the open source model side and folks building a product on top of fine-tune LLaMA tune for chat. How do you think about that kind of competition entering the market and the differences versus the approach you’ve taken?

Noam: I love being a full stack company. It means we get to mess with every layer and do the code design. If there’s something that’s going to affect something at the end, we get to mess with it at the beginning, and we get to pull in lots of user data as feedback.

Plus, a lot of us invented this stuff, so of course we’re going to do a full stack company. A lot of us are motivated by launching. I think the people we’re attracting to work at Character.AI are people who love inventing stuff and love launching it.

Noam calculates global compute capacity

Sarah: Let’s go into the evolution of the underlying technology. I think there’s a recent finding around AI developing theory of mind, or just the knowledge that other beliefs, desires, intentions may be different from one’s own. Is this surprising to you and what do you think that means for human-AI relationships?

Noam: Just make the thing smarter, it’s going to have a better theory of mind. I think that’s definitely something that’s massively important. It seems like one of these emergent properties that just is going to come with scale. I see this stuff massively scaling up—it’s just not that expensive.

I saw an article yesterday: NVIDIA is going to build another 1.5M H100s next year. That’s 2M H100s, so that’s 2 times 10 to the 6th. They can do about 10 to the 15th operations per second, so 2 times 10 to the 21, divided by 8 times 10 to the ninth people on Earth. That’s roughly a quarter of a trillion operations per second, per person. Which means that it could be processing on the order of 1 word per second on a 100 billion parameter model for everyone on Earth. But not everyone on Earth because some people are blocked in China and some people are sleeping. It’s not that expensive. This thing is massively scalable if you do it right, and we’re working on that.

Universally accessible intelligence

Sarah: I think you said once that the internet was the dawn of universally accessible information and we’re now entering the dawn of universally accessible intelligence. Building off of your last answer, what did you mean by that? Do you think we’re there yet?

Noam: It’s really a Wright Brothers first airplane kind of moment. We’ve got something that works and is useful for now, a large number of use cases, and looks like it’s scaling very, very well. Without any breakthroughs, it’s going to get massively better as everyone scales up to use it. There will be more breakthroughs because now all the scientists in the world are working on making the stuff better.

It’s great that all this stuff is accessible in open source. We’re going to see a huge amount of innovation. What’s possible in the largest companies now can be possible in somebody’s academic lab or garage in a few years. As the technology gets better, there’s going to be all kinds of great use cases that emerge pushing technology forward, pushing science, pushing the ability to help people in various ways. I would love to get to the point where you can just ask it how to cure cancer. It seems a few years away for now, but…

Sarah: Do you think we need another fundamental breakthrough, like the transformer technology, to get there or do you think we actually have everything that we need?

Noam: I don’t know. It’s impossible to predict the future, but I don’t think anyone’s seen the scaling laws stop. As far as anybody has experimented, stuff just keeps getting smarter, so we’ll be able to unlock lots and lots of new stuff. I don’t know if there’s an end to it, but at least everybody in the world should be able to talk to something really brilliant and have incredible tools all the time. I can’t imagine that it will not be able to build on itself.

The moral of the story is, at the core, the computation isn’t that expensive. Operations cost like 10 to the negative of $18 these days. If you can do this stuff efficiently, even talking to the biggest models ever trained, the cost of that should be way, way lower than the value of your time—or most anybody’s time. There’s the capacity there to scale these things up by orders of magnitude.

Sarah: I’d like to end on that note. Thank you so much, Noam, it was awesome.

Noam: Thank you.