16 minutes News

16 Minutes on the News #37: GPT-3, Beyond the Hype

Frank Chen and Sonal Chokshi

Posted July 29, 2020

In this special “2x” explainer episode of 16 Minutes — where we talk about what’s in the news, and where we are on the long arc of various tech trends — we cover all the buzz around GPT-3, the pre-trained machine learning model that’s optimized to do a variety of natural-language processing tasks. The paper about GPT-3 was released in late May, but OpenAI (the AI “research and deployment” company behind it) only recently released private access to its API or application programming interface, which includes some of the technical achievements behind GPT-3 as well as other models.

It’s a commercial product, built on research; so what does this mean for both startups AND incumbents… and the future of “AI as a service”? And given that we’re seeing all kinds of (cherrypicked!) examples of output from OpenAI’s beta API being shared — from articles and press releases and screenplays and Shakespearean poetry to business advice to “ask me anything” search and even designing webpages and plug-ins that turn words into code and even does some arithmetic too — how do we know how good it really is or isn’t? And when we things like founding principles for a new religion or other experiments that are being shared virally (like “TikTok videos for nerds“), how do we know the difference between “looks like” a toy and “is” a toy (especially given that many innovations may start out so)?

And finally, where are we, really, in terms of natural language processing and progress towards artificial general intelligence? Is it intelligent, does that matter, and how do we know (if not with a Turing Test)? Finally, what are the broader questions, considerations, and implications for jobs and more? Frank Chen (who’s shared a primer on AI/machine learning/deep learning as well as resources for getting started in building products with AI inside and more) explains what “it” actually is and isn’t; where it fits in the taxonomy of neural networks, deep learning approaches, and more in conversation with host Sonal Chokshi. And the two help tease apart what’s hype/ what’s real here… as is the theme of this show.

Show Notes

What is GPT-3, how do developers access it [3:56], and how is it different from other machine learning tools? [6:52]

Discussion of how to categorize GPT-3, how it learns [9:57], and where it fits into the AI big picture [13:43]

Real-world applications and scalability [16:20]

How existing technology companies may respond to tools like GPT-3 and further iterations [19:16]

Discussion of how people may work with tools like GPT-3 in the future, and how it could transform the workplace [20:54]

Ethical concerns around stereotyping and racism in AI [24:34]

The need for a new Turing test for AI [26:47] and predictions for the future [30:18]

Transcript

Sonal: Hi, everyone. Welcome to this week’s episode of “16 Minutes.” I’m Sonal, your host, and this is our show where we talk about the headlines, what’s in the news, and where we are on the long arc of tech trends. We’re back from our holiday break, and so this week we’re covering all the recent and ongoing buzz around the topic of GPT-3, the natural language processing-based text predictor from the San Francisco research and development company, OpenAI.

They actually released their paper on GPT-3 in late May, but only released their broader commercial API a couple of weeks ago. So, we’re seeing a lot of excitement and activity around that, in particular, although it’s all being called GPT-3. So, we’re going to do one of our explainer episodes. It’s a 2x explainer episode going into what it really is, how it works, why it matters, and broader implications and questions while teasing apart what’s hype, what’s real, as is the premise of the show. But before I introduce our expert, let me just quickly summarize some of the highlights.

So, while GPT-3 is technically a text predictor, that actually reduces what’s possible because, of course, words and software are simply the encoding of human thought — to borrow a phrase from Chris Dixon — which means a lot more things are possible. So we’re seeing, and note these are all cherry-picked examples — believable forum posts, comments, press releases, poetry, screenplays, articles, someone even wrote an entire article headlined “OpenAI’s GPT-3 may be the biggest thing since Bitcoin,” and then revealed midway that he didn’t actually write the article, but that GPT-3 did.

We’re also seeing strategy documents, like for business CEOs and advice written entirely in GPT-3. And not just words, but we’re seeing people design, using words, to write code for designing websites and other designs. Someone even built a Figma plugin — again, all of it showing the transmutability of thoughts to words, to code, to design, and so on. And then someone made a search engine that can return answers and URLs in response to “ask me anything,” which as anyone who’s been in the NLP space knows. I was at PARC when we spun off Powerset, back in the day, and that’s always been sort of a holy grail of question-answering, which you know all about too having worked in this world, Frank.

And now, let me introduce you — our expert in this episode. Frank Chen has written a lot about AI, including a primer on AI deep learning, and machine learning, a pulse check on AI, what’s working, what’s not, a microsite with resources for how to get started practically and do something with your own product and your own company, and then reflecting on jobs and humanity and AI working together. You can find all of that on our website.

Frank, to start things off, what’s your favorite example of GPT-3 so far? Mine is founding principles for a religion written in GPT-3. I’d love to hear your favorite and also your quick take on why the excitement — to start us off before we dig in a bit deeper.

Frank: My favorite out of the whole thing is it’s doing arithmetic. So, if you ask it what’s 23 plus 67, like just arbitrary two-digit arithmetic, it’s doing it. This is a natural language processing model. And so, basically, it got trained by feeding it lots and lots of text. And out of that, it’s figuring out — we think — how to do arithmetic, which is very, very surprising, because you don’t think that, like, exists in text. The excitement potentially is promising signs of, you know, progress towards general artificial intelligence.

So, today, if you want to do very highly accurate natural language processing, you build a bespoke model. You have your own custom architecture, you feed it a ton of data. What GPT-3 shows is that they train this model once and then they throw it a whole bunch of natural language processing tasks — like fill in the blank, or inference, or translation. And without retraining it at all, they’re getting really good results compared to finely-tuned models.

What actually is GPT-3?

Sonal: Before we even go into teasing apart what’s hype, what’s real, let’s first talk about the “it.” What is GPT-3?

Frank: So, we have two things. One, we have a machine learning model. GPT is actually an acronym — it stands for Generative Pre-Trained transformer. We’ll go through all those in a sec. But thing one is, we have a pre-trained machine learning model that’s optimized to do a wide variety of natural language processing tasks, like reading a Wikipedia article and answering questions from it; or guessing what the ending of a story should be; or so on and so on. So we have a machine learning model. The thing that people are playing with is an API that allows developers to essentially ask questions of that model. So, instead of giving you the model and you program it to do what you want, they’re giving you selective access via the API.

One of the reasons they’re doing this is that most people don’t have the compute infrastructure to even train the model. There’s been estimates that if you wanted to train the model from scratch, it would cost something like $5 to $10 million of cloud compute time. That’s a big, big model, and so, like, they don’t give out the model. And then two, the controversy around this thing when they released the first version was they were worried that if they just gave the raw model out, people would do nefarious things with it — like generate fake news articles that you would just, like, saturate, bomb the web — and so they were like, look, we want to be responsible with this thing, and so, we’ll gate access via API so then we know exactly who’s using it. And then the API can be a bit of a throttle on what it can and can’t do as well.

Sonal: Right. Well, while helping them learn. And just as a reminder, APIs are application programming interfaces. We’ve talked a lot about them on the podcast, and people who want to learn more can go to a16z.com/api to read all our resources, explainers. There’s so much we have on this whole topic. But the key underlying idea — and this goes to your point about the cost of what it would take if you were trying to build this from scratch — is APIs give developers and other businesses superpowers because they lower the barrier to entry — in this case, for anyone being able to use AI who doesn’t necessarily have a whole in-house research team, etc. And so, that’s one of the really neat things about the API.

But I do want to correct one misconception the folks out there aren’t aware of when it comes to GPT-3. What they’re describing as GPT-3, they’re actually playing with OpenAI’s API, which is not just GPT-3. Obviously, some of the technical achievements of GPT-3 are in the API, of course, but it’s a combination of other things. It’s like a set of technologies that they’ve released and it’s their first commercial product, in fact. So, that’s just to give people a little context on what the “it” is and isn’t there. Let’s go ahead and go a level deeper into explaining what it is. In their paper, they describe it simply as an autoregressive language model. Can you share what it is and kind of the category this fits in?

What categories does GPT-3 fit into?

Frank: Yeah. So, the broad category of things it fits into — it is a neural network, or a deep neural network. And architectures basically talk about the shape of those networks. At the highest level, visualize it as something comes in on the left, and then I want something to shoot out on the right side — and in between is a bunch of nodes that are connected to each other. And the way in which those nodes are connected to each other and then the connection weights, that’s essentially the neural network. GPT-3 is one of those things. Technically, it’s called a transformer architecture. This is an architecture for neural networks that Google introduced a few years ago. And it’s different than a convolutional neural network, which is great for images. It’s different than a recurrent neural network, which is good for simple language processing. The way the nodes are connected to each other results in it being able to do, essentially, computations on large sentences <Yes.> filled with different words and doing it concurrently instead of sequentially. So, RNNs, which were the former state-of-the-art on natural language processing, they’re very sequential. So, they’ll kind of go through a sentence a word at a time…

Sonal: Recurrent, right?

Frank: Exactly. These transformer networks can basically, sort of, consider the entire sentence in context while it’s doing its computations. One of the things that you classically have to do with natural language processing is you have to disambiguate words. “I went to the bank” — that could mean I want to go withdraw some money, or it can mean I went right up to the edge of the river — because we have ambiguity in these words. The natural language processing system needs to figure out, well, which sense of bank did you mean? And you need to know all the other words around that sentence in order to disambiguate it.

And so, these transformers consider large chunks of text in trying to make that decision all at once instead of sequentially. So, that’s what the transformer architecture does. And then what OpenAI has been doing is basically transforming this type of neural network, with the transformer architecture, on larger and larger datasets. Conceptually, think of it as you’ll have it read Wikipedia, and think of that as generation one. Generation two is, I’m going to have it read Wikipedia and all of the open-source textbooks that I can find. This generation, they trained it on what’s called common crawl. It’s kind of the same thing that Google uses to search and index the internet. There’s an open-source version of that. Think of it as — robots go onto every webpage, they gather the text, and now we’re using that as the training set for GPT-3.

Sonal: Yeah. Something like half a trillion words, I believe.

Frank: Yeah. It’s a crazy number of words. And then this thing has two orders of magnitude more than the previous attempts, that’s something like 175 billion parameters. For the purposes of this conversation, a way of measuring the complexity of a neural network.

Sonal: Right. GPT-2 had 1.5 billion.

Frank: And in between GPT-2 and 3, Microsoft did one that was 17 billion, right? So, like, there is a bit of an arms race here going on, which is, like, how big are your neural networks?

How GPT-3 learns

Sonal: What does it mean, because the paper’s called “Language Models are Few-Shot Learners.” And I remember this movement in one-shot learning where you can learn on very few examples, but honestly, what you just described to me sounded like almost a trillion examples, when you think about what it’s ingesting as an input. So, can you actually explain what few-shot even means in this context?

Frank: Yeah. So, first, they trained this model on the internet. Basically, what came in as input on the left side was reams and reams and reams of text — all the text they could get their hands on, and they cleaned it a little. And so, this is very traditional deep learning. It is not itself a zero-shot or a few-shot approach. It’s deep learned, which means I have incredible amounts of input text. What they mean in the context of this paper around no-shot and few-shot is, the model can perform a variety of natural language processing tasks. So, a good example of it is analogies — king is to queen, as water is to what, right?

In the context of the system, what you can do is you could give it an example of that, and they call that one-shot — which is, I’m going to give you an example of an analogy that’s completely filled out, and then I want you to fill out more analogies. Another task would be — pick the right ending of a story, and I will give you one example with the correct answer. So, I’m just going to give it to you once. Now, typically what happens when you do traditional neural network learning — you take an example, you give it to the system, and you tell the system the right answer. The system uses that right answer to basically readjust the neural net. It’s called backpropagation. And the theory is that, as it adjusts the weights inside the neural network, it will get that answer more correct the next time it sees it.

And so, everything up until this point has basically been — if I give you enough examples, I’m going to be able to tell whether that picture has a hot dog in it or not. I will be able to generalize the features of a hot dog, and I will basically deduce hot-dogness if you just give me enough pictures and you tell me, hot dog or not. What’s going on here is they train this model once, and then they give it one example — that example doesn’t adjust the weights of the model. It really just primes the system to basically prepare it to answer this type of question. So, you basically tell it, look — I want you to work on, fill in the blank, and I’m gonna give you one or a few examples (few-shot) of this, and then we’ll go from there. But those examples that you give it don’t adjust the weights of the model. It’s one model to rule them all. And this is kind of how humans learn. They don’t need to see 1,000, 10,000, 100,000 examples of hot dogs before they can start reliably telling whether it is a hot dog or not.

Sonal: It’s like how children learn language.

Frank: Yeah, exactly. Babies, before they can say cat and dog, can recognize the difference between cats and dogs — they didn’t see a million of them, right? In fact, they can’t say the words dog and cat yet. And so, maybe something like this is going on in the brain, which is you have this sort of general processor, and then it instantly knows how to adapt itself to solve a lot of different problems, including problems it had never seen before. And so, I’m going to go back to my favorite example of, like, what GPT-3 was used for. Like, how in the world did it deduce the rules for two-digit arithmetic by reading a lot of stuff? And so, maybe this is the beginnings of a general intelligence that can rapidly adapt itself. Now, look, I don’t want to get ahead of myself. It falls apart on four-digit arithmetic. And so, it’s not generally smart yet, but the fact that it got all of the two-digit addition and subtraction problems right by reading text, like, that’s crazy to me.

Fitting GPT-3 into the AI big picture

Sonal: The general takeaway is that it does some complicated things really well, and some really easy things really badly, and this is actually true of most AI. The researchers have a huge section on limitations where, “GPT-3 samples can lose coherence over sufficiently long passages, contradict themselves, and occasionally contain non sequitur sentences or paragraphs.” Now, of course, as an editor, that made me laugh because that’s also true of human writing. <laughter> So, I was like — okay, this is also true about the writing I’ve seen and edited, so I don’t know who’s talking here. Help me tease apart where we really are in this long arc. I’m having a hard time knowing what’s real, what’s not. Like, help me kind of understand what is this thing, really, at this moment in time.

Frank: So, we have the most sophisticated natural language processing pre-trained model of its kind. The natural language processing community has basically divided the problem of understanding language into dozens and dozens of sub-tasks. And task after task after task, GPT-3 goes up against the state-of-the-art, the best performing system. And basically what the paper does is lay out, okay, here’s where GPT-3 is approaching state-of-the-art, here’s where it’s far away from state-of-the-art. And that’s basically all we know, is — compared to state-of-the-art techniques for solving that particular natural language processing task, how does it perform? We’re really in the research domain. <Right.> So, if you were to ask me, can I build a startup on it? Can I build the world’s best chatbot on it? Can I build the world’s best customer support agent on it?

Sonal: I was going to ask you that.

Frank: Yeah, I think it’s really too early to tell whether you can build any of those things. The hope is that you could, and long-term, really, the hope is, having built a model like this and exposed an API, you could take any Silicon Valley startup that wants to solve a text problem — chatbots, or pre-sale support, or post-sales customer support, or building a mental health app that talks to you. All of those things will get dramatically cheaper and faster and easier to build on top of this infrastructure.

If this works, you have this generally smart system that’s already been trained, then you show it a couple examples of problems that you want to solve, and then it will just solve them with very high accuracy. All you have to do, as a startup or a programmer, is to say, “Hey, look, I’m going to give you a couple of examples of the type of problem that I want solved.” And then that priming is going to be enough for the system to get very accurate results. And, in fact, sometimes better results than if you had built the model and fed it the data sets yourself. So, that’s the hope, but we just don’t know yet.

Use and scalability

Sonal: That’s a really good reminder because they themselves are like, this is early days, it’s research, there’s a lot of work to be done — but it’s also really exciting, as you’re saying, because this is one of the most advanced natural language models we’ve seen. So, the question I have then, on the startup and building side — what would it take to — what are the kinds of considerations to make it more practical and scalable? I mean, for one thing, the size — you described how the transformer has this ability to sort of comprehend so much at once without doing it in kind of this RNN model, but the trade-off of that is that it’s so slow, or be able to fit on a GPU. So, I’d love to have a quick take from you on, what are the things that need to happen to make something like this more usable, etc.

Frank: I think what’s going to need to happen is that the OpenAI product team is going to have conversations with dozens and dozens of startups that are using their technology. And then they successfully refine the API and improve the performance, and set up the security rules and all of that, so that it becomes something as easy to use as say, Stripe or Twilio. Stripe or Twilio are very straightforward — send a text message or process this payment. This is a lot more amorphous, which is, “Hey, I can do SAT analogies. How’s that relevant for my startup?” Well, there’s a bit of a gap there, right? You have a startup that’s like, “Hey, I need my documents summarized,” or, “I need you to go through all of the complaints we’ve ever gotten and give me product insight for product managers.” And so, there’s basically a divide between there that needs to be closed over time.

Sonal: Right. So, what does this mean with the data world? Because one really interesting [thing] to me is, on one hand, APIs give you superpowers — kind of democratizing things. On the other hand, it kind of makes things a bit of a race to the bottom then, because then you have to differentiate — kind of private, proprietary, these other elements. So, do you have thoughts on what that means?

Frank: Yeah. I mean the hope for something like a GPT-3 is that it’s going to dramatically reduce the data gathering, cleansing, cleaning process — and, frankly, building the data model as well, your machine learning model. So, let me try to put it in economic terms. Let’s say we put $10 million into a Series A company, and then $5 million of it goes to getting data and cleaning it and hiring your machine learning people, and then renting a bunch of GPUs in Amazon or Google or Microsoft, wherever you do your compute. The hope is that if you could stand on the shoulders of something like GPT-3 — and it’ll be a future version of it — you would reduce those costs from $5 million to $100,000.

You’re basically making API calls and the way you program “this thing” is you just show it a bunch of examples that are relevant to the problem that you’re trying to solve. So, you show it texts where you had a suicide risk and you don’t need to show it a bunch because it’s pre-trained — and you show it a new text that it hasn’t seen before and you ask it, “What is the risk of suicide in this text exchange?” The hope is that we can dramatically reduce the costs of gathering that data and building the machine learning models. But it’s really too early to tell whether that’s going to be practical or not.

Sonal: So we know what it means for startups, but how do the incumbents respond in that kind of a world? But it seems almost inevitable that the big players — there might be an AWS potentially, right, that could, you know, make this a given in their services — like this kind of bigger question around this business model of AI as a service.

Frank: Yeah. So, the first thing I’ll say is this is OpenAI’s first commercial product, which is interesting, right? Recall that OpenAI started as a research institution, so we’ll sort of see what the pricing is. If this works, the scenario that I described earlier, which is — dramatically reduce the time it takes to build a machine learning inside product — then all of the public cloud providers and other startups will offer competing products because they don’t want to let OpenAI just take all of the, sort of, text understanding ability of the internet, right?

Google Cloud and Microsoft and Amazon and Baidu and Tencent, like they’re all gonna say, “Hey, look, I can do that too — build your application on me.” Now, I will say that because of the large costs of training the model — so I’d mentioned estimates ranging from $5 million to $10 million to train this thing once — and obviously, they didn’t train it once to get to where they were, they trained it multiple times as they did the research process. And so, this is not going to be for the faint of heart. It’s going to come on the back of a lot of money with very skilled scientists using enormous infrastructure. But to the extent that this product works, then you’re going to have very healthy competition among all of the incumbents. You might even have new players who’ll figure out a different angle on it.

Working with machine learning

Sonal: You know, it’s really fascinating watching the people who have access. And basically, the recurring theme is that it’s not like plug and play, it’s obviously not built and ready for that yet. The prompt and the sampling hyper-parameters matter a lot. Priming is an art, not a science. So, I’m curious for where you think the knowledge value is going to go in the future. What are the sort of — the data scientists of the future going to look like for people who have to work with something like this? Now, granted the models are going to evolve, the API will evolve, the product will evolve — but what are the skills that people need to have in order to really do well in this world coming ahead?

Frank: It’s really too early to tell, but it is a fundamentally different art of programming, right? So, if you think of programming to date, it’s basically — I learn Python, and I learn to be efficient with memory, and I learn to write clever algorithms that can sort things fast. That’s well-understood art, thousands of classes, millions of people know how to do that. If this approach works, basically, there is this massive pre-trained natural language model, and the programming technique is basically I show you a couple of examples of the tasks that I want you to perform — it’ll be about what examples do I show you, and in what form? And do I show you the outliers, or do I show you some normal ones, right? And so, if this approach works, it’ll all be about — how do you prime the model to get the best accuracy for the real-world problems you actually want your product to solve? Programming becomes — what examples do I show you, as opposed to how do I allocate memory and write efficient search algorithms? It’s a very different thing.

Sonal: Vitalik Buterin, the inventor of Ethereum, described this when he was observing some of this buzz around GPT-3 that, “I can easily see many jobs in the next 10 to 20 years changing their workflow to ‘human describes; AI builds; human debugs.’” There’s a lot of speculation about how this might affect jobs. It can displace customer support, sales support, data scientists, legal assistants, and other jobs like that are at risk. Do you have thoughts on the labor and jobs side of this — like just sort of the broader questions and concerns here?

Frank: The way I think about this generally — and informed a lot by Erik Brynjolfsson and other people — so if you think about a job as a set of tasks, some tasks will get automated, and then some tasks will be stubbornly hard to automate, and then there’ll be new tasks. And so, think of jobs as sort of an ever-changing bundle of tasks, some of which are performed by humans today, some of which will get automated, and then there are new tasks. And so what Vitalik describes — if this AI stuff works, being able to prime the AI system with the right examples, and then being able to debug it at the end — those are two new tasks. No human on the planet gets paid to do that outside of AI researchers today. But that could be mainstream knowledge work in 10 years, which is — you pick good examples, and then you debug it at the end. So, you have these brand new tasks that are generating economic value and people get paid for them, that didn’t exist before.

Sonal: I find it very fascinating what you said, by the way, because what it also means to me is it becomes more inclusive for more people to enter the worlds that might have been previously closed off to a certain class of type of programmers, or people who have certain technical skills, because — let’s say you’re very good at describing things, and it’s more of an art than a science, and you’re very good at sort of fiddling with and hacking at things, you might be better off than someone who went through like years and years of elite Ph.D education at tuning something than someone else.

Frank: I think the machine learning algorithms will invite more people who would otherwise be discouraged into pursuing careers, in careers they wouldn’t have naturally risen to the top of. So I think you’re right.

Ethical concerns and safeguards

Sonal: What do you make of the concern — there was concern that GPT-3, these answers that it gave, that it predicted, were rife with racism or stereotypes. What do you make of the data issues around that?

Frank: Okay. We’re going to feed it every piece of text on the internet and then we’re going to ask it to make generalizations. What could possibly go wrong? A lot could possibly go wrong. If you look at the heart of this system, it’s basically, I’m trying to guess the next word. And the way I make my guess is, I go look at all the documents that have been written ever and I ask, what words are most likely to have occurred in those documents, right? You’re going to end up with culturally offensive stereotypes. And so, we need to figure out — how do we put the safety rails? How do we erect the APIs? I’m glad the OpenAI researchers and the community around them are being very careful about this because we obviously have to. How do we basically teach it the social norms we want it to emit, as opposed to the ones that it found by reading text?

Sonal: Another whole philosophical sidebar, but really important is, if you think about the internet as the sum total of human knowledge, then other things that reflect many of the realities in the world, which are atrocious and awful in many cases. The flip side of it is, it’s a lot harder to change the real world and people and behavior and society and systems, but probably a hell of a lot easier to change a technical system and be able to do certain things. So, to me, what’s implicit in what you said is that there’s actually a solution — I don’t mean to be solutionistic, but that’s within the technology that you don’t necessarily get from IRL, in real life.

Frank: Yeah, that’s exactly right. And if it were in algorithm land, so to speak, where we are, right, GPT-3 and its descendants — let’s say GPT-17 gave you a text document, right? It wrote a text document for you. You could take that document and put it through whatever filter that you wanted, right, to filter out sexism or racism, and that layer could be inspectable and tuneable to everybody. You didn’t know how GPT-17 came up with its recommendations, but you have this safety net at the end, which is you can filter out things that you don’t want. So, you have the second step that you can actually put into your system. You don’t have to depend just on the first thing, you can catch that at a subsequent stage.

Updating the Turing test

Sonal: Right. And you can have sort of a system of checks and balances. So a broad meta question — one of my favorite posts was from Kevin Lacker, and he basically gave GPT-3 a Turing test, and he tested it on these questions of common sense, obscure trivia, logic. And one of the things he observed is that, “We need to ask it questions that no normal human would ever talk about.” And so, he said, if you’re ever a judge in a Turing test, make sure you ask some nonsense questions and see if the interviewee responds the way a human would. Because the system doesn’t know how to say I don’t know, and this goes at this question of what does a Turing test tell us? And there’s been a lot of work, as you know, over the years about the modernization of the Turing test — like in 2016, Gary Marcus, our friend, Gary Marcus, Francesca Rossi, and Manuela Veloso published an article “Beyond the Turing Test” in “AI Magazine.”

Barbara Gross of Harvard wrote a piece called “What Question Would Turing Pose Today?” in “AI Magazine” in 2012. And she basically starts by saying that in 1950, when Turing proposed to replace the question “can machines think?” with the question “are there computers which would do well in the imitation game?” — at the time, computer science wasn’t a field of study. You know, Claude Shannon’s theory of information was just getting started. Psychology was just only starting to go beyond behavior. And so, what would Turing ask today? He’d probably propose a very different test. And so, the question I really wanted to ask you is, how do we know if the thing is measuring what it’s supposed to measure, or answering what it’s supposed to answer, or that it’s getting smarter, I guess?

Frank: This is more a philosophical question than an engineering question. So, why don’t I say what we know, and then I’ll widely speculate on the other stuff?

Sonal: That’s great. That’s life and science, so go for it.

Frank: Exactly. So, basically, if you read the paper, you’ll see that it compares GPT-3’s performance against various other state-of-the-art techniques on a wide variety of natural language processing tasks. So, for instance, if you’re asking it to translate from English to French, there’s this thing called the BLEU score. The higher the BLEU score, the better your translation. And so, every test has its measure. And so, what we do know is we can compare GPT-3 performance versus other algorithms, other systems. What we don’t know is, how much does it really understand? So, what do we really take away from the fact that it aced two-digit arithmetic? Like, what does that mean? What does it understand of the world? Does it get math? Let’s say you had a system that was 100% accurate on every two-digit arithmetic problem that you ever gave it. It’s perfect at math, but it doesn’t get it. Like, it doesn’t know that these numbers represent things in the real world, but what does that mean to claim that it doesn’t get it? That’s a philosophical question.

Sonal: Right. It’s philosophical because the question then becomes — does it even matter if it comes to applying things practically? Because I think about this from the world of education, you know, there’s a big focus on metacognition and the awareness of knowing what you know and don’t know. But at a certain point, if the kid is doing well on the test and the test is applicable to the world, and they can basically survive and do well, does it even matter if they really understood what arithmetic really means, as long as they can solve the problem when you go to the store, that I give you a dollar, I get 5 cents change back? You know what I mean?

Frank: That’s exactly right. And if you generalize that out to other tasks that humans solve in the real world, imagine you just got good at 100 and then 1,000 and then 10,000 of these tasks that you have never seen before. Let’s say descendants of GPT-3 got that good at a wide variety of language tasks — what does it mean to insist, but it doesn’t get the world, it doesn’t get language, right? <laughter>

Predictions about future implications

Sonal: Yeah. That’s fantastic. I’d love to get sort of your perspective on how we think about this broader arc of innovation that’s playing out here. Daniel Gross called GPT-3 screenshots the TikTok videos of nerds. And there’s something to that — it’s kind of created this inherent virality. So, I’m curious for your take on that. On the one hand, some of the most important technologies start out looking like a toy. Chris Dixon paraphrased a really important idea from Clayton Christensen about how disruptive innovation happens. But a lot of the people who are researchers really emphasize — this is not a toy, this is a big deal.

Frank: There are a lot of TikTok-ish-like videos that are coming out of the whole playground, which is basically a place where you can try out the model. And on the one hand, people are saying it’s a toy because they’re in the sandbox and they’re basically having fun feeding it prompts. Some of those examples are actually really good, and some of those are, like, comically bad, right? So, it feels toy-like. The tantalizing prospect for this thing is that we have the beginnings of an approach to general intelligence that we haven’t seen us make this much progress on before, which is — today if you wanted to build a specific system for a specific natural language processing task, you could do that. Custom architecture, lots of training data, and lots of hand-tuning and lots of, like, Ph.D time.

The tantalizing thing about GPT-3 is, it didn’t have an end-use case in mind that it was going to be optimal for, but it turns out to be really good at a lot of them, which kind of is how people are. You’re not tuned to, like, learn polka or double-entry bookkeeping, or learn how to audio-edit a podcast — like, you didn’t come out of the womb with that, but your brain is this general-purpose computer that can figure out how to get very, very good at that with enough practice and enough intentionality.

Sonal: Well, it’s really great that you use the word tantalizing because if you remember the Greek myth root behind it, Tantalus was destined to constantly get this like tempting fruit dangling above him as punishment. And it was so close yet so out of reach at the same time. So, bottom line it for me, Frank.

Frank: It’s tantalizing, right? Now, look, there’s a limit to how big these models can get and how effective the APIs will be once we sort of, you know, unleash them to regular programmers. But it is surprising that it is so good across a broad range of tasks, including ones that the original designers didn’t contemplate. So, maybe this is the path to artificial general intelligence? Now, look, it’s way too early to tell. So, I’m not saying that it is, I’m just saying it’s very robust across a lot of very different tasks, and that’s surprising, and kind of exciting.

Sonal: Thank you so much for joining this episode of “16 Minutes,” Frank.

Frank: Awesome. Thank you, Sonal, for having me.