When approaching deep learning for the first time, there is a huge difference between what I consider foundational algorithms (those that power just about every neural network model that has existed ever) and architectures.
I think this distinction is important because it will help you determine how best to learn both. I would argue the foundational algorithms are more important to start with, and they are a prerequisite for the architecture types.
What do I mean when I’m referring to foundational algorithms? These include, but are not limited to, the following:
- Backpropagation. This algorithm is literally the engine that powers everything that a neural network is. Today there is no deep learning without backpropagation. It’s the elegant algorithm developed by Rumelhart, Hinton, and others back in the 1980s that determines how we train models. For one of the most intuitive explanations of backprop I’ve encountered, check out.
- Gradient descent. This a super important algorithm for determining how we update weights of a neural network. Vanilla gradient descent forms the core of all the fancy other stuff you see in papers including AdaGrad, Rmsprop, Adam, etc. so spend the time to learn it well. As a side note, though gradient descent is extensively used in deep learning, there’s nothing about the algorithm that restricts it to neural networks. In fact, it can be used for many different machine learning models including linear regression, logistic regression, etc.
After you’ve got the foundational algorithms down, the architectures refer to model designs including:
Start with learning feedforward networks, and then you can learn the other two architectures in whatever order makes most the sense for what you are working on.
Finally, a few other algorithms that are used extensively in neural networks which aren’t foundational, but are important to know for practical deep learning application. Learn these after the stuff above:
- Dropout. If you plan on using regularization for your neural network (and you inevitably will), this is the most important regularization technique. I have basically never built a model that didn’t use dropout.
- Weight initialization schemes. It turns out when building neural networks, how you initialize your weights is crucial for determining whether or not the model trains successfully. Therefore a number of different heuristics have been developed for initialization that you should learn eventually.