A Little History
Let’s start with a little history. If you are anxious to get to the practical bits, feel free to skip this section. The perfect tl;dr version courtesy of Jeff Bezos in his 2016 letter to Amazon shareholders is this:
Over the past decades computers have broadly automated tasks that programmers could describe with clear rules and algorithms. Modern machine learning techniques now allow us to do the same for tasks where describing the precise rules is much harder.
For starters, let’s just assume artificial intelligence is about making computers smart in the way that we consider each other smart: that is, we can talk and understand each other; we can see the world and work out plans to get from our home to a friend’s house; we can learn new behavior like knitting or playing basketball by watching other people, and so forth.
Researchers have been trying to endow machines with these human capabilities since antiquity. The Wikipedia article on the history of artificial intelligence describes some of the early attempts. But most historians would date the beginning of AI as we know it today to the Dartmouth Summer Research Project on Artificial Intelligence over the summer of 1956.
In the 60+ years since that kickoff, researchers have tried many different techniques to program computers to mimic human intelligence. That six-decade history of AI divides reasonably well into “Classical AI” and “Modern AI”.
In Classical AI, researchers used logical rules to model intelligence. Building AI meant representing the world in a set of data structures (such as trees or lists or sets) and then using rules (such as and, or, if-then-else, and so on) to reason about that knowledge. For example, we could represent language as a set of words, and we could perform machine translation by translating those words from one language to another, and then reordering the words since we know that languages put their nouns and verbs and adjectives in different places. Or we could try to solve vision recognition problems by describing cats as “four legged animals with whiskers” and then decompose that into a set of sub problems (find legs, find whiskers) and those problems into more detailed problems (find edges, separate foreground and background).
Researchers enjoyed many early successes: so many in fact that Marvin Minsky famously said in 1967 that “within a generation…the problem of creating ‘artificial intelligence’ will substantially be solved.”
Unfortunately, efforts based on these approaches did not go far. While the classical approach yielded spectacular demos that worked in constrained environments (like the block world of SHRDLU or the talk psychotherapy of ELIZA), they failed to generalize. The hype cycle followed by disillusionment was so severe that funding and startups would vanish for years at a time in a set of so-called “AI winters.”
If Classical AI was about very smart researchers creating rules attempting to understand the world, Modern AI techniques focus on letting computers derive their own “rules” using lots and lots of data. Rather than explicitly telling a computer how to find a cat, we’ll just show the computer a lot of examples of cats, and see if the computer can construct a cat detector by figuring out what differentiates cats from dogs or muffins or couches or motorcycles.
The Modern AI approach is to get a data set, and then use a set of machine learning techniques with cool names (i.e., logistic regression, decision trees, Gaussian Naive Bayes, random forest, k-nearest neighbors, or deep learning) on that data set. So the general approach is to gather a bunch of pictures with cats and another set of pictures without cats, and feed enough of these pictures to the algorithms. Given enough data, these machine learning algorithms can do a very good job (in many cases, better than humans) of distinguishing cats from any picture.
Researchers are excited about the Modern AI approach because it seems to work across many different domains; that is to say, modern AI techniques such as deep learning seem to be generalizing to solve many different classes of problems. For example, page through Jeff Dean’s presentation on Trends and Developments in Deep Learning Research to see how Google uses this Modern AI approach in everything from photo recognition to Gmail’s smart replies to better search to disease diagnosis.
Note: For more detailed discussion we recommend the first chapter of Russell & Norvig’s Artificial Intelligence, A Modern Approach, 3rd edition.