The fundamental data structure of a neural network is loosely inspired by brains. Each of your brain cells (neurons) is connected to many other neurons by synapses. As you experience and interact with the world, your brain creates new connections, strengthens some connections, and weakens others. A neural network’s data structure has many similarities: its “neurons” are nodes in a network connected to other nodes. Connections between nodes have a strength. Neurons activate (that is, generate an electrical signal) based on inputs they receive from other neurons.
But don’t get too carried away with the biological metaphor, or you will anger both neurobiologists and computer scientists — both will tell you that neurons in your brain behave very differently from the artificial neural networks in systems such as TensorFlow.
If you want to see some of this heat, read the IEEE interview with UC Berkeley professor Michael Jordan from October 2014.
Having said that, let’s walk through some of the common neural network architectures that you’ll come across. “Architecture” is a fancy way of describing the rules that govern how nodes connect to each other and what shapes they can form. The Wikipedia article on the types of artificial neural networks is a good reference for further exploration.
Feedforward Neural Networks (FFNs)
Feedforward networks were the first type of artificial neural network devised. In this network the information moves in only one direction: forward. Input nodes receive data and pass it along, as seen in Figure 1.
From the input nodes data flows through the hidden nodes (if any) and to the output nodes, without cycles or loops, and may be modified along the way by each node. They are called feedforward because processing moves forward from left to right. As Michael Nielsen points out, “hidden nodes” aren’t some mysterious philosophical or mathematical construct: they are simply nodes in the middle of the network that are neither an input or an output. When feedforward networks have multiple layers, they are called multilayer networks.
Create your own feedforward neural networks with the browser-based TensorFlow playground. We did a brief demo in our AI primer (begining at about 27:17), but get in there and click around on your own. The things that you can change (depth of the network, the activation function, the learning rate, etc.) are called hyperparameters. So by clicking around in the playground, you are “modifying the hyperparameters of a feedforward multilayer neural network”. Isn’t that something?
Convolutional (Neural) Networks (CNNs or ConvNets)
Convolutional neural networks are a specific type of multilayer feedforward network typically used in image recognition and (more recently) some natural language processing tasks. Introduced by Yann LeCun, Yoshua Bengio, Leon Bottou, and Patrick Haffner in 1998, they were originally designed to recognize handwritten postal codes and check amounts. They are faster to train than traditional feedforward networks because they make simplifying assumptions about how nodes connect to each other and how many nodes you need in the network, drastically reducing how much math you have to do to train the model.
Here’s a visual from Andrej Karparthy’s Stanford class to get us going. Don’t worry about understanding the details just yet.
Some researchers point out that the design of these networks are inspired by biology and in particular by the visual cortex. Hubel and Wiesel discovered in the 1960s that cats have cells which responded specifically to certain regions of the input coming in from the retina, and further that cats had both so-called simple cells which responded to lines and edges as well as so-called [complex cells] that responded to those same inputs, even if they were rotated or in a slightly different place (that is, “spatially invariant”). They hypothesized that cells were organized into a hierarchy exactly like the neural networks we’ve been discussing: simple cells would feed their output to complex cells in exactly the same fashion as nodes “to the left” feed their inputs to nodes further “to the right” in a multilayer network.
The neural inspiration in models like convolutional nets is very tenuous. That’s why I call them “convolutional nets” not “convolutional neural nets”, and why we call the nodes “units” and not “neurons”.
Well-known implementations of ConvNets
Specific implementations that you might read about include LeNet, AlexNet, ZFNet, GoogLeNet, VGGNet, and ResNet. Some are named after people and companies, others by some property of the network. Most of these have won the gold in the Olympics of this space, namely the ImageNet ILSVRC.
These networks are very cool, and here are a set of resources for learning more about them, sorted by complexity (easiest first):
- Read a good beginner’s guide written by a UCLA computer science undergrad named Adit Deshpande
- Sample some of the answers to the Quora questions: “What is a convolutional neural network”?, “What is an intuitive explanation of convolutional neural networks?”
- Read Andrej Karpathy’s class notes from Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition. Make sure to scroll down to see the cool animation that shows you what a convolution is.
- This 2013 paper by Matthew Zeiler and Rob Fergus provides some visual examples that help you understand the intuition behind the architecture.
Recurrent Neural Networks (RNNs), including Long Short-Term Memories (LSTM)
The third and last type of neural network we’ll discuss is the recurrent neural network, partly because they are widely used and partly because we suspect your eyes are glazing over. There are many other types of neural networks. If you are interested in learning more, we suggest a visit to the Asimov’s Institute Neural Network Zoo
Remember how in a feedforward network, computation only goes forward, or if you’re looking at a diagram, “from left to right”? Also we didn’t say it, but feedforward (and convolutional networks) take fixed sized inputs and outputs. Once you decide how many elements in the input and output vectors, that’s it. You train the model and hope you get good results.
RNNs relax both those constraints.
First, RNNs support bi-directional data flow, propagating data from later processing stages back to earlier stages as well as linearly from input to output. This diagram from Christopher Olah’s excellent overview article shows the shape of an RNN:
This architecture enables the RNN to “remember” things, which makes them great for processing time-series data (like events in an event log) or natural language processing tasks (like understanding the roles each word plays in a sentence by remembering what word came before can help you figure the role of the current word).
Secondly, RNNs can process arbitrarily-sized inputs and outputs by processing vectors in a sequence, one at a time. Where feedforward and CNNs only work on fixed sized inputs and outputs, RNNs can process vectors one after another thereby work on any shape of input and output. Andrej Kaparthy comes to the rescue with a diagram that shows this from his excellent blog post titled The Unreasonable Effectiveness of Recurrent Neural Networks:
Read Andrej’s whole blog post, which is a great explanation of the structure of RNNs. In it, he describes how to build a Paul Graham essay generator by training the system with the full 1M characters of his essays (alas, a very small corpus by AI standards) and building an RNN. You can tune one of the hyperparameters of the RNN to generate the sentence that Paul Graham is most likely to write, and that is an infinite loop of:
“is that they were all the same thing that was a startup is that they were all the same thing that was a startup is that they were all the same thing that was a startup is that they were all the same”
Who said neural nets weren’t fun?
Together, these two enhancements over feedforward networks have made RNNs incredibly powerful tools for solving many different types of AI problems including speech recognition, language modeling, machine translation, image captioning, recommendation systems, predicting the next word in a sentence for text generation systems, and others.
A specific type of RNN that you’ll see discussed is called the Long Short-Term Memory (LSTM). Bizarre, no? Is the memory short or long? Anyway, this type of RNN was introduced by Hochreiter and Schmidhuber in 1997 and does an even better job of remembering something from “further back in time”, compared to vanilla RNNs.
To learn more:
- Edwin Chen’s blog post, complete with cartoons of Snorlax and references to jelly donuts does an excellent job of explaining the basic concepts and comparing RNNs with LSTMs.
- Read Christopher Olah’s blog post if you want to understand how LSTMs do their remembering and forgetting. It’s a beautiful piece of explanatory writing and illustration.
- Rohan Kapur’s Medium post is also great.
How do these architectures relate to the other Deep Learning frameworks I’ve heard of?
One last topic before we wrap up here is: how do these neural network architectures relate to the libraries or frameworks, such as TensorFlow and Caffe?
The quick answer is that you can implement most neural net architectures in each of the popular neural network libraries.
And if you can’t implement a specific model in one of the popular libraries, you can always write your own so your software can eat the world.
Here are 15 machine learning libraries to get you started. The indefatigable Andrej Kaparthy posted a “Google Trends”-esque type analysis showing what’s hot if you peek inside 28,303 machine learning research papers over the last 5 years.
Ok, that’s enough on neural network architectures. What have we learned?
- There are many different types of neural networks, each useful for solving specific AI problems.
- The field is evolving quickly; Ian Goodfellow invented GANs in 2014.
- You got here because you looked for but couldn’t find a higher-level API that does what your software needs, so you needed to train your own model.
- Reminder: neural networks aren’t the only machine learning algorithms. You might solve your problem with a clean data set and a simpler machine learning algorithm like a good ol’ linear regression.