Training Your Own Models
If you’ve played with the examples from the previous section on NLP, you’ve seen the power of APIs. Send in a picture, get a list of objects in that picture; send in a sentence, and get the emotional tilt of that sentence, or get the sentence back in another language. Magic.
Just as with other areas of software development, sometimes the API you need doesn’t exist and you’ll have to write your own code. A common starting point for AI programming is to select a machine learning algorithm, train it with data (see this Medium post for a set of strategies for getting data), and then expose that trained model via APIs to the rest of your code.
There is a large set of machine learning algorithms with fun names, such as decision trees, random forest, support vector machines, and logistic regression. Each algorithm is best suited for a specific situation depending on how much data you have, how many “features” or dimensions of data you can feed the algorithms, and how sparse or dense the data set is. Sometimes it’s hard to figure out which algorithm to use, and you will have to try a few different algorithms (and combinations of algorithms) to see how they do.
Here are a few good starting points to picking the right ML algorithm to solve your specific problem:
- How to choose machine learning algorithms for Microsoft Azure machine learning
- Stack Overflow answer to “When to choose which machine learning classifer?”
- Scikit-learn documentation: Choosing the right estimator
Deep learning is a class of machine learning that has gotten a lot of well-deserved attention in recent years because it’s working to solve a wide variety of AI problems in vision, natural language processing, and many others. Also, in contrast with many of the other machine learning algorithms where data scientists or software engineers have to figure out which features will lead to good predictions, deep learning approaches figure out the features themselves.
For example, let’s say you were using linear regressions to try to predict the price of a home like Trulia does. With most machine learning approaches, you’d have to figure out “features” — think of them as factors that will drive price like how big the house is, when the house was built, the price of nearby houses, the number of bedrooms and bathrooms. With deep learning, you don’t pick the features. The algorithms essentially find the features for you in the data.
Because (1) it’s working and (2) it figures out features on its own, we’ll spend the rest of the time in this playbook digging into deep learning. But before we continue, this tweet is spot on:
We’d encourage you to try potentially simpler, admittedly less glamorous algorithms before deep learning. Sometimes a linear regression is all you need.
Having said that, deep learning algorithms are incredibly powerful and getting amazing results across many different domains. Professor Christopher Manning, a longtime veteran of NLP research at Stanford, says in his introductory lecture for “CS Natural Language Processing with Deep Learning” that “in the length of my lifetime, I’d actually say it’s unprecedented [for] a field to progress so quickly”.
Deep learning data structures and algorithms were originally inspired by the way neurons in the brain work, but most researchers today will tell you that brains and neural networks used in software like TensorFlow are very different. But if you are interested in the history of how we got here, check out these excellent resources which we’ve ordered by depth, from most concise to most comprehensive, for your reading pleasure.
- Andrew L. Beam, Deep Learning 101
- Andrey Kruenkov, A “Brief” of Neural Nets and Deep Learning
- Haohan Wang and Bhiksha Raj, On the Origin of Deep Learning
- Jurgen Schmidhuber, Deep Learning in Neural Networks: An Overview
By the way, why do we call it “deep learning”? It’s called deep learning because the underlying algorithms work on data structure that looks like a graph of connected nodes, and the nodes are organized into layers. Data goes into the left-most nodes, and the output comes out the right hand side. Between the input and output nodes, there are many layers of other nodes; hence, the network is “deep”. This diagram from Nvidia does a good job of illustrating the concept:
To learn more:
- Watch legend Jeff Dean from the Google Brain team lecture on Large-Scale Deep Learning for Intelligent Computer Systems
- Read Michael Nielsen’s excellent ebook and Website Neural Networks and Deep Learning
You might be wondering why this revolution is happening now given that some of the original ideas date back to the 1950s. The short answer is a common one in technology: bountiful and inexpensive compute, storage, and data. Andrew Ng shares this conceptual graph illustrating how the effectiveness of deep learning improves as you feed it more data and more computing resources:
While the fundamental ideas are generally the same, the scale at which we are using them has changed, and that has brought quantitatively different (and better) results, in part because we can now test ideas we couldn’t test before. Scale constraints created a barrier to evolution. As cloud computing has made large-scale experiments possible, the techniques have evolved and improved significantly.
Professor Geoff Hinton (Google and University of Toronto) discussed why previous approaches failed. The first two reasons he identifies have to do with scale (“Our labeled datasets were thousands of times too small.”) and compute capabilities (“Our computers were millions of times too slow.”), which clearly don’t speak only to the speed of processors but compute capacity in general (i.e., including processor, memory, storage, networking).