Some very bright people then discovered how to do “back propagation”, which allowed (in theory) multi-layer networks to solve any type of classification problem. The back propagation algorithm is so called, because of the way it works - it compares the output of a network with the desired value and feeds back tiny amounts of the error through the network to modify the weights.
If you wanted to do something useful with a Neural Network, such as perform pattern recognition – identifying images that contain a car - you start by converting raw pixel inputs into feature activations. These feature activations are often hand crafted and are designed to pick out something like an individual wheel or grill. The network would then learn how to weight the feature activations and decide what it’s seeing in the image.
However, using back propagation to solve these problems really did not work for a number of reasons:
- It’s really hard to hand craft feature detectors
- It requires pre-classified (labelled) training data - almost all real world data is unlabelled
- The learning time does not scale well - especially with really large networks
- The network can often get stuck in “local optima” – it will stop learning before arriving at the correct solution
But then, with the passage of time, the story slowly changed. The rise of the Internet and Big Data brought with it huge amounts of labelled data. Computers also got a lot faster, especially with the creation of Graphics Processing Units (GPU) - by orders of magnitude. And, most importantly, we learnt new and better techniques to initialise the networks.
The key difference between techniques used in modern deep learning algorithms and the neural networks of old, is that the network creates its own feature detectors – they are not hand crafted. Therefore, the only limitation is computing power – and we have plenty of that!
Deep networks learn one layer at a time, using a generative model of the input (visible) data that connects to a layer of latent (hidden) nodes. The hidden layer is then used to train a second generative model against the next hidden layer, and so on. One technique used to achieve this is a restricted Boltzmann Machine (I’ll post some code next time).
Just like human vision systems, deep learning systems for image recognition process stuff in layers. For example, the first layer may learn correlations between pixels to begin to form tiny edge detectors. By the time you reach the third or forth layer the activations could represent complete wheels, hands, faces, etc.
The program learnt to play 49 different retro computer games, and came up with its own strategies for winning. The research was carried out by DeepMind, the British company bought by Google last year for £400m, whose stated aim is to build “smart machines”.
Likewise, Microsoft believes that too much of the world’s Big Data is going to waste and has just launched a new initiative to help organisations process it all, build APIs and finally make some sense out of it. The technology, called Azure Machine Learning (ML), is a new cloud based service that can be accessed via any web browser. It’s simple to use, featuring a simple drag and drop interface that data scientists and developers use. The main aim of ML is to reduce the amount of work that’s needed for organisations to deploy machine learning.
Not to be left behind, a Facebook project known as Deep Face can discern the accuracy of the true identity of any picture of you. The Deep Face AI system is now powerful enough to spot individual users from the 400 million photos uploaded to the social network every single day.
In the future, deep learning systems could be used to power self-driving cars, personal assistants in smartphones or conduct scientific research in fields from climate change to cosmology.