There are many reasons why we’re in an “A.I. spring” after multiple “A.I. winters” — but how then do we tease apart what’s real vs. what’s hype when it comes to the (legitimate!) excitement about artificial intelligence and machine learning? Especially when it comes to the latest results of computers beating games, which not only captures our imaginations but has always played a critical role in advancing machine intelligence (whether it’s AI winning Texas Hold’em poker or beating the world human champ in the ancient Chinese game of Go).
But on learning that Google DeepMind’s AlphaGo can master the game of Go without human knowledge — or more precisely: “based solely on reinforcement learning, without human data, guidance, or domain knowledge beyond game rules” — some people leap too far towards claims of artificial generalized intelligence. So where can we then generalize the findings of such work — unsupervised learning, self-play, etc. — to other specific domains? What does it mean for entrepreneurs building companies (and what investors look for)? And what does it mean for how we, as humans, learn… or rather, how computers can also learn from how we learn?
Deal and research operating team head Frank Chen and a16z board partner Steven Sinofsky ponder all this and more, in conversation with Sonal Chokshi, in this episode of the a16z Podcast. We ended last time with the triumph of data over algorithms and begin this time with the triumph of algorithms over data … is this the end of big data?
Discussion of how AlphaGo’s algorithm learned Go [0:45], and the potential limits of machine learning [9:41]
Questions of humans interacting with AI, and our expectation of accuracy [18:46]
Potential bias in algorithms [29:46] and final thoughts about the future of AI/ML [34:21]
Sonal: Hi, everyone, welcome to the “a16z Podcast.” I’m Sonal. And I’m here today, bringing back the band together again — the AlphaGo band, I guess — I don’t know how else to describe us. But we have the head of our Deal and Research Team Operation, Frank Chen. We have Steven Sinofsky, a16z board partner.
Just to give people some quick context, you don’t have to have heard our previous podcast on AlphaGo. But when AlphaGo, the algorithm produced by DeepMind, beat the world champ in Korea in playing Go, which is an ancient Chinese game, we had a podcast where we discussed a lot of the themes and some of the broad things around that. And what we’d like to talk about today is their latest paper — but not just only specifically, but more broadly — what this means for where we actually are. What’s hype, what’s real in AI or artificial intelligence. So, welcome, guys.
Steven: Welcome. So, like one thing that, like — you read this paper, and the paper is published in “Nature.” It’s pretty dense. It has 17 authors on it. And so, it’s quite the force. But the thing that sort of jumps out is the paper, the blogs, everybody — is, the first thing you read is, “one step closer to creating general purpose AI.” And immediately, like, my AI winter, fears of hype antennae pop up, because, like, everything is one step closer. But like, you could take very, very, very tiny little steps, or you could overhype them.
Frank: And we’ve heard these promises throughout history, and especially around board games. So, we solved checkers, and people were like, “Oh, one step away from general intelligence.” Then we solved chess, same thing, we’re one step away from general intelligence.” Then we did the, “Oh, I can find bacteria that causes infection; therefore, we must be one step away.”
And I think the fallacy is that because we’re doing these things that are considered, sort of, high cognition — like the smart people play chess, smart people figure out…
Sonal: Yeah, strategy.
Frank: …then solving one thing that a smart person does must lead us to the next thing that a smart person does, which will lead us to the next. And that’s always been the fallacy, which is, it actually hasn’t quite generalized.
Steven: Well, in fact, they don’t even really compound all that much. They’re all fairly discrete. The one thing that’s different now is that they are all building on these new artificial intelligence or machine learning techniques, and taking a whole bunch of data, training a model on it, and developing solutions that beat all their algorithmic ones. And that was a big thing about the first AlphaGo was, like, “This is data over algorithms.” And all of a sudden, here we are…
Sonal: Algorithms over data.
Steven: Or, a model over data. But you know, Frank, you have an interesting way of looking at this, because, you know, you play Go, and you understand that…
Sonal: Oh, you play Go? I didn’t know that.
Frank: Yeah. Well, enough, yeah.
Steven: More than me.
Sonal: I only play chess. I don’t even know…
Frank: Yes, Chinese chess and Go. All right, I play a few games.
Steven: But, like, it’s not like a generalized problem. You know, it comes with, like, a whole bunch of constraints and things that make it solvable by algorithm.
Frank: Yeah. So, what are some traits about Go that are not like the real world? All the rules are completely well known. The state of play is completely well known, right? In the real world, mostly, we live in a “fog of war” situation, where we know some things, and we don’t know other things. It would cost us something to go figure it out. In Go, and in particular the representations they chose, the entire board state is known to both players, right? So, there’s a lot of things that aren’t like the real world at all when we do these board games. Now, having said that, why are people getting breathless, yet again, around one step closer that AI…
Sonal: I think, because the authors don’t claim generalized intelligence. They’re talking about being able to do — apply some of these techniques in other domains.
Frank: Yeah, so to give credit where credit’s due. Let’s, sort of, review what they actually achieved. And they are, like, very impressive achievements. So, what they achieved was, they now have a Go player that beat all of their other Go players.
Sonal: AlphaGo Zero.
Frank: Well, they named them by the people they beat. So, they had AlphaGo Fan, AlphaGo Lee. This one’s AlphaGo Zero. So AlphaGo Zero, their latest one, has beat every iteration, which has beat all of the best human players, right? So, they have the best Go-playing algorithms in the world. And the difference in how they got to this one is, there was no training data from human games. So, all the other ones had been bootstrapped by, “Let me go watch what the human players do and see if I can mimic that.” And that, very broadly speaking, is the approach to machine learning called supervised learning.
Steven: And not a trivial amount. I mean, they had 100,000 games that they started from to train it on, like, 48 of Google’s TPUs for days on end.
Frank: That’s exactly right. So, a fleet of machines, a ton of data. This one started from nothing, which is, it didn’t even really know the rules. It had a loss function, which is reinforcement learning’s way of sort of improving the algorithms over time. So, let me just give a quick intuitive explanation of reinforcement learning. So, it’s exactly like the game that you used to play when you were a kid and called hotter or colder. Somebody would go hide something, and then you would try to get to discover it.
Sonal: Are you getting hotter? I’m getting hotter, hotter, cold, cold again.
Frank: Exactly. And then, the hotter or colder is the loss function. That is the thing that tells you if you’re getting closer or not.
Sonal: It’s basically trial and error, simply put.
Frank: That’s exactly right. And so, that’s the fundamental approach in AlphaGo Zero, which is, they have this loss function that describes, are you more likely to win the game having made this move or not, right? Ao, that’s all they had. And then they didn’t have human input. They didn’t have human games. They basically said, “I have this loss function that tells you whether you’re more likely to win this game or not.” And then it played itself.
Steven: One other important thing, I think, it has — it also knows — it has the codified rules of the game.
Sonal: That is the one human input it actually got.
Steven: Well, right. And it makes it a very, very constrained problem. Because there’s a whole bunch of the decision trees — like, hot or cold, you could climb up the sofa, you could leave the house, you could go all over the place for days on end. Whereas, this just tells, “No, it’s gonna only be on the floor.”
Sonal: These are the rules of the game.
Steven: Like, this is not gonna be under the sofa.
Frank: It’s a 17 by 17…
Sonal: Very little of real life is actually, like, you get the real Shogi that way.
Frank: Yeah, it’s a 17 by 17 board, it’s black and white pieces, there are rules.
Steven: Oh, yeah, you switch turns, like all of that stuff matters a lot.
Sonal: It does. And one more thing to add, by the way, too, because you mentioned the 48 TPUs — it’s really significant that they got it down to 4 in this case. That’s a huge, like — on the power side energy, like — you know, it’s a simpler architecture.
Frank: It’s a simple architecture, and thinking from the point of view of a software developer, now you’re back to one machine. You’re not like, “Oh, my God, I need to go rent this massive cloud with massive storage and massive interconnects. And, like, I need to figure out how to provision the cluster and manage the cluster.” You’re back down to one machine, right?
Steven: So, this stuff was pretty impressive. It’s, you know, they did four TPUs. It was three days of playing.
Sonal: Three days total.
Steven: Like, it’s very achievable, you know, like, on your Amazon credits.
Frank: That’s exactly right. And if you think about sort of the approach that most startups take to artificial intelligence today, they basically take the supervised learning approach. And step one, raise money so that you can go get a data set that’s annotated, train your neural network, make recommendations, right? And you could be $1, or $5 or $10, or $50 million in getting that data set, depending on how complicated the data set is. All right, so the reason people are so excited about this is, look, this had no data, aside from rules of the game. It basically played itself. And by day three, it was better than everything that had been trained before it.
Sonal: Nearly an order of magnitude.
Frank: Yeah, it was, orders of magnitude on this…
Sonal: And the results would be a hundred to zero.
Frank: …48 TPUs versus 4. It was 3 days versus 40. It was 30 million train games versus 4.9. So, order of magnitude improvement on all of those dimensions, so, like, let’s give credit where credit is due — this is a very impressive technical achievement. And then, the question that we sort of entered the session with, “Okay, does this make us more likely to be able to create an artificial general intelligence, where the learning algorithm is generalized across domains?” In other words, can I take the breakthroughs here, and make a better pick and pack robot for Amazon? Or make a better healthcare predictor to discover whether you have cancer?
Steven: Or Salesforce forecasting, code generation, or a whole bunch of stuff. And the interesting thing is that — also the techniques here — this is just proof of something that people have been talking about for a long time.
Sonal: Oh, yeah, reinforcement learning has been around for ages.
Steven: And so, part of what’s interesting to me is that reinforcement learning has been around for a long time. Obviously, the modeling has been around for a long time…
Sonal: Self-play.
Steven: …self-play, all of these things. And, like, so many times these steps are, like, somebody new to the domain looking and pulling together a bunch of unrelated things, and just coming up with a very elegant, incredibly elegant solution. And it’s super impressive, but it’s not clear it generalizes. And I think that was one of the things that jumped out for me in reading about it, you know. When you hear, “Oh, and then the next step, on this train of generalized intelligence is drug discovery, and protein folding, and quantum chemistry, and material…” And it’s like, all of a sudden, I’m trying to figure out — protein folding. Like, what are the constraints on protein folding? Well, we know they’re amino acids, and we know that they have to be in three dimensions. But actually, nobody else knows…
Sonal: There’s no codified rules.
Steven: Like, nobody has the rules of protein folding.
Frank: So, there’s no rules, there’s no perfect understanding of the search space. There’s like — what’s the loss function? Like, how would you even write a loss function?
Sonal: I do want to push back a little bit, though, because far be it from any of us in here to hype this up. But there’s something unique happening here, which at least — I perceive this in the paper. Which is, that I was struck by the analogy to evolution. Like, this is how human beings have evolved. This is evolution — that we learn by trial and error on a massive million scale. So, I don’t want to completely dismiss the idea that we can get to some kind of generalized intelligence. I mean, of course, I understand and agree, but what are the limits? And what are the possibilities that can actually take us there? And where are we constrained? Just to break that down a bit more.
Frank: Yeah. So, I love the evolution analogy, right? Because in the paper, they talked about how, you know, when it started out, it was making, sort of, naïve moves. It was, sort of, greedy and trying to capture all the tokens. And then, it got to very sophisticated patterns that humans have discovered over thousands of years playing this game, teaching this game, codifying it in books. And it figured that out. And not only did it figure it out, it figured out things that humans haven’t quite codified, right?
Sonal: Right. New ways of playing.
Frank: If you play more games, sort of, it developed an intelligence that humans haven’t yet, because it’s going through its own reinforcement thing. And thousands of generations of games playing each other, sort of, arrive at places that maybe humans would have gotten to if we played another thousand years. But like, you know, it figured out in three days. So, I think there is something incredibly profound going on here, where you’re, basically — you’re accelerating natural selection cycles in computers.
Now, where I think the analogy breaks down is, “Oh, and therefore we can apply this approach to every other problem.” And it’s just going to be a straightforward application of the set of ideas to those problem sets. And then, we’re going to have evolution at that scale.” In other words, eating different problems, right, in exactly the same way. And I think that’s where Steve and I have a little skepticism.
Steven: Yeah, well, and I think also like the key with all of these, if you just take an abstract view of it, you know, you have this awesomely elegant solution to an intractable problem, which is just, on any measure, super cool. But that doesn’t make it generalizable to any other space. There are many, many super cool things that surface. And you have to be careful as, like, an engineer, or a founder, or a person applying this to [go], “Okay, well, what are the elements of this solution that one would need to have as a precursor to applying it.”
And we saw the same thing with all of the work on supervised learning. Like, first you need a data set that is clean, and then it has to be, like, labeled really well. And then, you have to have a neural network model. And you have to do all the weights and all this modeling. And so, there are all these things where you couldn’t just say, “Hey, I’d like to make the most money in our Q4 sales. Let’s machine learn our way there.” And you, like, you can’t just show up on Thursday and do that. Every year, salespeople — they develop a model for how they want to sell, how the customers — what the prospects are. They know the rules — like, there are all these rules. Like, there are this many salespeople, they can only call so many people per day. They know which customers to call. They know what the quotas are going to look like. They know what the product — and you start to think, well, maybe there is something here, because it’s a space where, like, actually, the history might not be as well as applicable to supervised learning as you would like. And so, if there was a way to look at this through the lens of, like, what would be the optimal system to navigate? Or what would be the sales matter? And I think that there are more things like that than fewer. I don’t know how many, you know, properties of nature are amenable to this. Because, for the most part, we don’t understand them.
Sonal: Right. It’s limited by what the human knows, right? And we can’t codify those rules. What are the domains that you see some transferability of these types of things?
Frank: Well, I love the idea of, sort of, sales forecasting, right? Because, essentially, what you’re doing — the intuition is that if I could play act my salespeople doing a million different things, in a million different situations, against a million different set of prospects, you know, I could sort of simulate that, then maybe best practices would just come out of that, right? Because, essentially, that’s what I’m doing in real life. I raise a ton of money, I hire a sales leader, that sales leader hires a bunch of people, gives them a playbook, and has them call a bunch of prospects. I can only run so many experiments, right? Because every call is an experiment. So, the idea would be, if I could simulate what happens inside these calls, and simulate 10, 100, 1000, a million times more, then I’d get much better best practices emerging from that. So, super intriguing idea.
Steven: Off the top of my head, you start wanting to think about, like, “Okay, what about, you know, cybersecurity.” And you think, you’re looking at your code, and you know at any given moment, like, the type of code that can be in a specific place. And you know, the rules and the syntax, and it’s well understood, and you know patterns that are also bad. And so, today, what you do — and the people that have tried to apply machine learning to code — they just have a lot of examples of, like, missing equal signs, or missing semicolons, or operators are wrong. But if you think about it, like, the syntax of the language is completely fixed. And so, you’re back at very much a…
Sonal: A rule based…
Steven: A rule based kind of — you know, rule base is a constraint of, like…
Sonal: Constraint, right.
Steven: These are the only ways you can put the…
Sonal: Definable constraint.
Steven: …symbols together. What you don’t have is right or wrong, win or lose with code.
Frank: Yeah, you don’t have a loss function that’s that objective.
Steven: They’re interesting things to me that, like, there’s just complexity of line that is a very common measure. Like, “Boy, something with three sets of parentheses in it, it’s likely to have a bug in it just because it has three sets of parentheses in it.” You know, something that’s missing, you know, some enclosure kind of rules, like using brackets, not using brackets. These kinds of things, you can actually flag, like — you can think of lint. Just flags them in C and C++ as just, sort of, high-risk behaviors.
So, there are things that you can put in to, like, sort of — and so then you start to think, “Wow, it’d be really interesting to have a tool that is able to look at code, and, sort of — just very different than previously, which was just literally looking at the syntax — but doing millions of examples of generating code and finding bad examples.” Would it be able to do a better job at finding bad examples in the next piece of code that falls into it?
Sonal: Right. I mean, one of the points you guys made last time, in our last podcast is that, at the end of the day, these things aren’t working in isolation. It’s not like there’s one magic approach. You know, there’s always a combination of techniques that come together to actually build real products. How would this, sort of, fit into that? Because one of the thoughts that I had is that — clearly, this kind of approach, even if you don’t have clearly defined rules, will always be more beneficial in places where we don’t have any data. Like, any big data. Just like humans, like kids, learning from N equals 2, like their parents. Or N equals 1, if they’re a single parent.
Frank: I think if we think about what humans do, they have many different types of intelligences. They have many different types of strategies for solving problems. So, my guess is that the artificial intelligences that we create will be similar. Lots of different strategies. And one of the interesting research items is, which strategy should I employ to solve this problem? Because this is something the brain does effortlessly. The strategies it employs to understand conversation are different than planning a trip, than, you know, making sure you don’t fall down, versus, like, long-range planning. Like, how do I choose the best career, right? So, all of these are very different problems. And somehow your brain kind of picks a good strategy for each one. Pedro Domingos talks about this as, sort of, you know, in his book, “The Master Algorithm,” which is — we know that there’s all of these different techniques, but kind of what we’re missing is sort of the synthesizer. The thing that we’ll know…
Sonal: Yes, putting it all together.
Frank: …what strategy should I pick to solve this problem? And it’s something that the brain seems to just do.
Steven: What’s interesting about that observation is, that’s where we are with machine learning today to begin with.
Sonal: Which is?
Steven: Which is just, like, there are all of these different networks that you can model, with so many different layers and how many parameters you want to use. And there’s, like, this art to it right now. And I find that particularly interesting, because anytime there’s an art to something, there’s an opportunity to start a company around that art and to build out a product that surpasses, you know, the best practices in whatever field you’re going after, whether it’s analyzing traffic or figuring out how to drive a car. And so, that is the opportunity, because I don’t think, you know — there’s not, like, some path where in the next two years, there’s the meta-algorithm that knows what way to pick, like, that’s — so, the best thing to do now is to become well versed in all of them.
And my gut just tells me that anytime you’re at a point like this, the most interesting solutions are actually going to end up being a hybrid of, like, the thing that used to work, that everybody said doesn’t work well enough, plus the new thing that everybody says is gonna replace the old thing. And that’s basically been the entire history of AI.
Sonal: Computing in general.
Steven: Well, in computing, for sure, but AI specifically, which is, like, everybody always says, “The new thing — that’s it, we finally got it. Huge step, the old stuff is all done. And the best example for me of that is how everybody said machine learning was going to replace all of natural language processing. But if you dig into any of the work that’s been going on, even the most state-of-the-art translation, which, you know, goes any language pair to any language pair — well, the input and the output all rely on the old school, like, from the 1970s, natural language stuff, just to do some very basic bookkeeping, very basic stuff.
Sonal: Right. You talk about, like, building a spell checker, whatever it is.
Steven: I mean, all the…
Frank: Parts of speech finder.
Steven: Right, and all the image stuff. Like, wow, you know, if you want to figure out features, it turns out, when you start doing features of image recognition, you’re using a whole bunch of old school edge detection and contrast and finding objects and all the stuff. And so, you can’t just, like, show up and say, “Now, we’re going to understand — we’re going to be the unsupervised learning company.” Because the question I was gonna ask is, well, how are you going to make whatever you’re doing practical?
Sonal: What I do remember about hearing the stories of the early days of NLP, and observing parts of this firsthand as well, is how the entire field and community — a lot of them had very strong opinions about, you know, there’s this whole phase of like, expert domain knowledge building. And really, that’s the only way to actually make NLP work at scale. There were all these things they had to do because it was before the days of big data. They couldn’t even conceive of the Google scale big data. And then, they went to this world where, “Oh, my God, we don’t even have to have these kinds of constraints and ways of doing things, because we have all this data.” And now, it’s sexy to me that you can flip that model again and almost say, “You don’t even need big data with the results like this kind of paper because you don’t have to have anything.”
Frank: This is the revenge of the algorithm’s movement, which is, it’s always been about the data. If we had more data, we’d have more accurate models. And what this experiment showed is, “Look, all I needed was the rules and a really good loss function, some very clever programming, and I get better performance than I had when I had the data set.” Right? So, that’s the tantalizing, “Oh, look, you could do it without data.” And like, look, as an investor, that’d be awesome. If I could fund AI experiments without this step of — collect data, annotate data, train model — we could run a ton more experiments
Sonal: Exactly. Because it’s age of abundance. You brought up Pedro Domingos, and it was funny because one of the comments he made on the heels of this announcement, which I thought was quite interesting, is, like, “Well, you know, AlphaGo Zero learned after, like, 5 million games, humans took only so many thousands of years.”
Steven: I love that.
Sonal: And I’m sort of like, “That’s the whole point of computers, is to learn it in three days.” You know, time is money. I don’t care if it took, like, you know, 5 million games, it took three days. But time is more important than amount.
Steven: You know, again, it’s an amazing accomplishment. It’s just, I always try to look at these in the context of how these innovations tend to happen. You know, if you just look, the same thing happened with search, like search itself. When people were inventing all the techniques of search, they were working on these tiny machines with all of these physics constraints about how much they could compute. And like, it was a whole thesis, just to be able to go get, like, the archives of “The New York Times” to search through it.
And the same thing happened with spell checkers. Like, wow, we’d love to have 50,000 words in the spell checker, but we don’t have that much memory. And then, there was a whole — well, now we have a lot of memory, and so now we’re spending all this money to try to compile all the words. And then, someone said, “Why don’t we just use all of the internet as the spelling dictionary. Then there’s no spelling dictionary, or the IME if it’s an Asian language. And so, this curve just keeps repeating itself. And I think that it — neither is going to win out, because data is always going to be valuable, because it’s an input that you can’t just say…
Sonal: Yeah, it’s a great way of thinking about it.
Steven: …whereas, the flipside is, not everything has data. So, if there’s this opportunity — like, you know, a great example is, like, how many of your Lyft drivers also have Waze turned on, just because they know the maps are entirely accurate, but they don’t know about accidents, or emergency closures, or weather, or whatever. And so having that data, combined with the data plus algorithms, that hybrid…
Sonal: Is always gonna win, right.
Steven: …is gonna continue to win, because only if you really want to be practical and actually solve the problem.
Sonal: I have to say one thing about this, which — it’s so fascinating that you use the example of search, and you describe this curve that we’re on. And part of startups is, sort of, you know, getting at the right place and point in time along that curve and where you are in the moment. Because sometimes, I think a lot of academics, or even people who are really in love with their own ideas, sometimes lose a big picture. Like they’ve had this build up, this expertise in one area. And they don’t realize, like, practically speaking, the world has changed around you. Because the point that I think is fascinating, as well, in the innovation story, using the search example, is that Google was, like, the 15th search company to come around before it hit success. And that is kind of relevant to think about.
Steven: Right. And it’s super important too, from the company building perspective, which is — they have this algorithm, which we all talked about back then. PageRank, you know, appropriately named but not…
Frank: After Larry.
Sonal: Or web page ranking as well.
Steven: Right, right. But the interesting thing was very rarely is, like, an algorithm, like, this secret sauce for a company. Because you could look at what goes on from the outside and pretty much reverse engineer an algorithm. So, again, back to — but if you have, like, a data source that you can actually…
Sonal: And a monetization model.
Steven: And turns out, one of the things that Google built out — was, like — they were crawling the web faster than AltaVista. They made this bet because they were machine learning people before machine learning was cool. They made the bet that having the data was going to be — and, it turns out, that was the barrier to entering the search market. Even for Microsoft and Bing [it] was, like, sucking in the entire internet fast enough.
Frank: Step one, crawl the inner web.
Steven: Step one, crawl the internet while it’s growing at exponential rates. And so, you know, I actually want to bring up one more thing that I just think is, kind of, really interesting from a practical point of view.
Sonal: I love it, you’re like the practical person on this podcast.
Steven: I know. I feel, like, particularly practical today because, well, I can’t play Go, so I gotta…
Sonal: You got to add something. I’m just kidding.
Steven: Which is, the thing that I find the most fascinating about all of these solutions in the space, is the engineering of a product — that you can make a commitment to customers that works. And then, when it doesn’t work, you can figure out why. And so, one of the things that’s so interesting about all of this is debugging.
Sonal: Hmm, interesting.
Steven: And how does that really fit in? And sure, you know, with Go, you lost the game. And of course, while they’re building all of this, they’re figuring out, “Whoa, what did we do wrong to make this move repeatedly?” They’re doing all of that debugging over the past N months. But, you know, if you just all of a sudden apply this to the enterprise space, or to adjusting news feeds, or a zillion other things that you can think of — like, figuring out where it goes wrong, like, that’s actually really critical to a business. Like, you can’t put a product out there if — pick our sales forecasting example, and then it’s wrong, like, you can’t just go, “Whew, the machines make mistakes just like people.” Your VP of sales is kinda messed up too. Because nobody pays money to, like, a computer for it to be wrong. And so how do we think about that?
Frank: Yeah, this whole idea is, sort of, transparency behind these models. In other words, do we know why they’re behaving the way they’re behaving?
Sonal: Demystifying the black box.
Frank: You know, a super active area of research right now, right? Which is, how do I make the deep learning models more transparent so that I can debug them, I can verify them, I can make sure there’s no systematic bias in them, right? Because until that, you couldn’t do important things like, “Hey, can this person have a loan or not?” Because the government will say you cannot make that decision unless we understand why it is that you’re making that decision.
Sonal: But you’ve made the argument, Frank, that, you know, when it comes to, say, self-driving cars, we kind of — it’s no better or different than what the human mind does. We don’t know we can’t interrogate the black box of the human mind that’s driving that car. So, the counterpoint of that is that, well, you’re right, Steven, that, you know, you’re paying for this computer to be smarter. But the reality also is, this stuff is not that smarter than humans anyway, so who cares?
Steven: Right. But the problem is first, that’s everybody else, not me. Yeah, like, I’m the best driver on the road. It’s all the other people. I mean, you know, I like to always go back to this — the wonderful, wonderful research on computers and society that Stanford professors Nass and Reeves did, because one of the things that they really realized, really back in the early ’80s…
Sonal: Oh, Clifford Nass. Oh, all right, rest in peace.
Frank: And Byron Reeves.
Steven: And Byron Reeves, which was that there’s something about a mechanical device that produces answers that makes the human brain ascribe way more authority to it than there necessarily should be.
Sonal: In fact, they even apply this to the world of voice recognition systems and the interaction.
Steven: They applied it to voice recognition, they applied it to chatbots. Before they were chatbots. I mean, so the interesting thing is that I don’t know how — it’s going to take a major change in society for people collectively to just go, “Ah, people make mistakes.” And that’s okay. But machines can’t. Like, we have, especially in the United States, a very, very low tolerance for devices making mistakes.
Sonal: Except when machines are doing a kind of alien intelligence that humans cannot do. Because, again, the most fascinating thing about our last talk about AlphaGo, this current talk — is that, at the end of the day, even though it opened — the AlphaGo Zero opened and closed with similar moves to what humans would do, it converged very quickly — there was a whole set of things in the middle that it did that were just things that humans would never have done. It’s actually, then, augmenting us in a very different way, because it’s adding a completely foreign intelligence. So, it’s not even comparable to our own to just judge…
Frank: Yeah, it’s a different type of intelligence, in the same way that animals have a different type of intelligence. And then, you make all kinds of category errors when you say giraffes aren’t intelligent, or bats aren’t intelligent, they’re just intelligent in a different way.
Sonal: Exactly. And we’re more for…
Frank: We can’t use the human yardstick to compare them.
Sonal: It’s the reason we are more forgiving when a dog pees all over your couch and your five-year old kid does the same freaking thing.
Steven: That’s definitely the case. The challenge that we have in technology is, just, the perception. Like, I mean, you can just see it in the discussion of news feeds, and algorithms, and how people are, like, they should just be right. I mean, like, and people should just have debugged this before it all happened. And actually, it’s not even all that crazy sophisticated, what’s going on. And, what’s weird is, of course, at the same time people are critical of what’s on the front page of newspapers, or what’s in the first five minutes of TV newscast — which is literally the same decision made by a human being, who’s just deciding what should we show first on CNN versus Fox News. Some human just made up their — with some black box in their brain, which is augmented by their title of, “in charge of production.” And yet infinitely forgiving of those choices. And so, I actually think that there’s a lot here. And I don’t have an answer, but I think it’s no different than any other software. Which is, if you’re going to make something and offer it to people in a commerce situation…
Sonal: It better damn work. And it better not be wrong.
Steven: It better be clear, like, how it does it to you. I mean, like — people, like — Excel was really great at doing math very quickly. And then, one day, I find myself at the Naval War College having to explain myself to a bunch of generals. Like, how do we know it’s right. I literally just had to sit there going, “I mean, it’s just right.”
Sonal: It just works.
Steven: And those are the moments that contribute to me feeling, like, a lot of empathy for what’s going on in the marketplace now about this, and why I’m so alarmed — not alarmed, that’s totally the wrong word — so focused on this, you know, know the outcome. Because, like, until you’ve just sat there with a bunch of people who control nuclear weapons, telling you we’re using a spreadsheet to calculate it, the weight of being right doesn’t really hit you. Because before that, we were just like, “Oh, my God, it doesn’t seem to make mistakes.” Then we were, like, super ecstatic. And it made, “Look, the charts are cool. Yay, ship it.” And then one day, they’re like, “Is it gonna work or not work?” You’re like, we couldn’t prove it. Like, that was the essence of it. Like, I couldn’t go to people at Boeing, or people at the Navy or wherever, and Wall Street, and prove that Excel works.
And then, of course, 30 years later, like, every time there’s a mistake in Excel, like, it’s a mistake in the human that typed it. Like, Ken Rogoff did a bunch of economic predictions that forecast the recession in 2008. And then, all of a sudden you find out his model was wrong, but the recession happened anyway. Did he make the right prediction or not the right prediction? And, like, how does that work? And I think that’s what’s going on here too, with these things.
Sonal: I have a philosophical question, then, for you guys.
Steven: That wasn’t philosophical enough.
Sonal: Well, one more philosophical. We’re being very philosophical here. You know, you said something about how people make judgment calls for what news to show on television, etc. And we have these expectations about algorithms. And one of the topics we’ve discussed on this podcast — Frank, you and I discussed with Fei Fei — is this idea of bias in algorithms, and how algorithms, by definition, can be biased — is one possibility of this type of work. Because they kept using the phrase tabula rasa, which of course I find so fascinating, because in human development, there’s an analogous world of this, where there was this theory that the human brain was also a tabula rasa or blank slate. And then they quickly learned, like, “No, we have millions of years of evolution and DNA, that’s actually — [we’re] actually coming in inheriting things.” Is there now a possibility that algorithms can write themselves, [in] a true tabula rasa-like way, given this type of work? Is that just way out there?
Steven: Oh, that one’s above my paygrade.
Sonal: These guys are throwing up their hands for our listeners, by the way.
Frank: I will give an example of, sort of, the things that are hardwired in your DNA. So, they’ve done a bunch of experiments that if you watch somebody’s hand getting pricked, your muscles in your own hand will involuntarily contract. Now, the big caveat is, if that person has a different skin color than you, your hand doesn’t contract.
Sonal: Really? I didn’t know that.
Frank: In other words, there’s something going on in your lizard brain that says, “That’s an ‘other’s’ hand.”
Sonal: Primal
Frank: Therefore, it’s not relevant to me. Therefore, whatever reflexes caused your hands to contract in sympathy, when somebody jammed a needle into that hand, that’s an example of this, sort of, prewiring. You know, the question is, can algorithms be prewired that way? I mean, you could…
Sonal: Or un-prewired even.
Frank: Or un-prewired.
Sonal: Because that to me is where the opportunity lies.
Frank: I mean, you could definitely write a set of rules that said, “You know, treat other people in a different way,” right? That would be top down. The peril that most people talk about today is not so much that the algorithms are biased, but you’ve fed it not enough data so that your prediction is biased. So, the classic example of this is, in the early days of vision recognition. Some of the image classification algorithms were categorizing people with dark skins as gorillas, that happened because they didn’t feed it enough data of dark-skinned people. So, when people talk about bias in algorithms, they are mostly talking about this phenomenon…
Sonal: Limited by data.
Frank: The human researcher or the human programmer selected an incomplete data set, and therefore you got biased results, as opposed to somehow the architecture of the neural network is biased inherently.
Steven: And I think that that’s a very important point that is being well studied, it’s well articulated. But particularly in the supervised learning case, the data that’s input, at least today — almost every data set that you have is going to be and have some inherent bias, because you weren’t aware of these factors when it was being collected in the first place. Because I think it’s fair to say, the awareness of all of these issues is at an all-time high relative to that. But then again, you look at all the medical studies, and you’re like, “Well, there haven’t been very many women in most of these studies, or there have been only women but studying a drug developed by…”
Sonal: That is true in biology and genetics research too. There’s a lot of limitations. Right.
Steven: But I do think that anybody today embarking on using supervised learning, whether for all or part of the solution — that data set is implicitly challenged, and in particular, it’s the labels. Because even though which labels you pick or which labels you omit, it’s going to create some bias in the model that you’re unaware of.
Frank: That’s exactly right. Your cultural background — your history, the way you grew up — will lead you to label an image a certain way. That may be different than somebody who — right? So what is the ground truth?
Steven: Well, the best example of this or just looking at images and, like, sentiment analysis is such a big thing. But images — like, facial expressions — have just been studied all around the world for decades, multiple decades, about, like, what is happy, what is sad, what is questionable? The same with speech and intonation, like, sometimes, you know, ending on a high note sounds like you’re asking a question, unless you’re in another culture where that’s making an exclamation.
Sonal: Or it can be vocal fry, and people are complaining that women shouldn’t speak that way.
Steven: Well, the vocal fry thing is a perfect example of that going on right now. And actually, the high note was the — sort of the ’80s version of that same thing.
Sonal: It goes in Vogue, too, right?
Steven: And be super careful about the labels. And that’s — to Frank’s point, that’s a great place for transparency. Because if you can go to a future or a potential customer and talk about, “This is what our model is based on when we’re forecasting your sales,” or “telling you to optimize your assembly line.” That could really matter to that process.
Sonal: So, high level takeaways. Hybrid works. Always — this is a thing — this is, like, a refrain. I feel like I should get you a T-shirt that says, “Hybrid works, God damn it.”
Steven: Well, be careful, because I’m really against a lot of hybrids.
Sonal: Oh, yeah. Like, hybrid cloud computing.
Steven: Yeah, hybrid in terms of solutions between old and new for coding. The old stuff that works definitely does work.
Sonal: Old and new, academic and industrial, like, exactly — all the things that make it practical versus pure, so to speak, purist.
Frank: I think another big takeaway of this is this “revenge of the algorithms” moment. Where there’s so much momentum right now that says, basically, we’re one labelled data set away from glory. Right? And this result basically shows you, “Wow, there’s a lot of mileage that you can get out of reinforcement learning, where there’s no data, no labels.”
Sonal: One take away from me is just the element of surprise, because I’ve been thinking a lot, even just, you know, how — I mean, I used to work in the world of how humans learn. But what’s fascinating to me is that the system played itself at a level calibrated to itself at every level. And in human learning, that doesn’t happen. You’re taught by your parents, you learn from, you know, adults, you learn from people who have more experience. There’s all these different things that happen.
And the thing that’s so fascinating to me, is that to me, it is very evolution-like. It’s like a big bang moment. And while I wouldn’t hype and say that means we’re going to end up here, I do think that’s very amazing, especially when you think about the surprises. Like, in the paper, one of the things they talk about is that the system learned something that’s actually very easy for humans to learn, way later in the game. It’s actually, I think, like, I forgot — shicho, or some specific move to, like, the ladder. The fact that the system took forever to learn something that’s very first for humans, I just — I’m endlessly fascinated by this relationship between humans and computers, and what we can learn from computers and vice versa. And also, what it means for the field, when you can actually add to how people learn to the field of artificial intelligence, machine learning.
Frank: Yeah, with reinforcement learning, one of the challenges has been, sometimes you get into this training epic where there’s no improvement, right? Because you’re just playing these games over and over again. And, like, what is it that causes the next game to be smarter? And this one was, like, in three days, it got incredibly smart. So, yeah, hats off to these guys.
Sonal: Yes. And let’s also add the other takeaway, which is that you can do amazing things with simple architectures, because we’re at a point and a moment right now where you can have 4 TPUs instead of 48.
Steven: For me, the one of the takeaways is, if you can frame your problem with a set of constraints and a set of rules and a fixed set of operations, that’s a very powerful concept that changes — because we’ve been so data centric, people have stopped trying to think of their solutions algorithmically. And it’s entirely possible that there is an algorithm, which is the same thing that I think we would have said a year ago, which is — everybody was so focused on machine learning that traditional algorithmic approaches might have been overlooked.
Sonal: And now…
Steven: And here we are again with proof. But still, now we have to go back and say, “Well, you know, there’s some basic machine learning that can work. There’s some basic algorithmic stuff. But that key is really the set of constraints.”
Frank: Yeah. For people who are interested, I highly recommend Andrej Kaparthy’s YC talk on this. You can search for it. It’s on Google Slides.
Sonal: He works at Google, right?
Frank: The whole talk is basically about where will artificial general intelligence come from? And he basically compares and contrasts this, sort of, rule-based world of Go, to something messy — like how would you build a pick and pack robot for Amazon? And so, if you’re interested in this topic of what’s the difference between rule-based board games, versus the messy real world, it’s a great presentation…
Sonal: That’s great for the generalized intelligence side. Are there any parting messages for entrepreneurs building companies that have very specialized things? Because one thing we have argued in this podcast, including you, Steven, when we did a podcast on building product with machine learning — is that sometimes the best places to play are very specific domain focus, whether or not they have a clearly explicit set of rules or not. So, any thoughts on that?
Frank: Yeah, so one takeaway is, one of the ways we evaluate startups from an investor’s point of view is, have you picked the best techniques to solve the business problem that you’re trying to solve? And I think this paper basically opens up a frontier of exploration that you might not have thought about before. Because, I think if you started an AI-centric company today, you would definitely be on the “get data, get labels, train model, make prediction” path. And this opens up another area you can try and figure out, “Am I solving a business problem where this supervised learning approach is gonna lead me to glory, instead of these — or, this reinforcement learning approach — is going to lead me to glory, as opposed to supervised learning or unsupervised learning?”
Steven: I think [that] the most interesting thing for me in looking at different companies and just talking to founders is just that there’s this world going on of advances and new things all the time. And you could get on the treadmill of always trying to be the newest thing. And, you know, we knew that the next generation of companies, you know, two years ago, we’re just gonna be, you know, whatever you were doing before plus AI. But the thing was, it wasn’t clear that that was always the best solution. And then, you replaced the AI with “now it has to be data and machine learning and labeling.” And so what Frank was saying is just super important to internalize, which is that part of being a founder, and building a new product is knowing that the reason why you’re choosing the technologies you’re choosing — and not just because you think that’s what investors are looking for. Our jobs, on that side, are to actually ferret out, like, who actually has a handle on the problem they’re solving, and a line that they can draw from that problem to their chosen solution path. And that is too often overlooked.
Frank: That’s exactly right. My favorite joke that I tell is — when I ask entrepreneurs, what machine learning algorithms they’re using in their product, when they’ve claimed to be a machine-learning-based startup. The answer I am not looking for is, “I don’t know. I’m sure we’re using the good ones.”
Sonal: Well, on that note, you guys, thank you for joining the “a16z Podcast.”
Steven: Thank you.
The a16z Podcast discusses the most important ideas within technology with the people building it. Each episode aims to put listeners ahead of the curve, covering topics like AI, energy, genomics, space, and more.