The back propagation algorithm has become the de facto training algorithm for artificial neural networks, and has been studied by the artificial intelligence community since the 1970s. It is used in commercial neural network software packages, such a MATLAB.
The principle of back propagation is actually quite easy to understand, even though the maths behind it can look rather daunting. The basic steps are:
- Initialise the network with small random weights.
- Present an input pattern to the input layer of the network.
- Feed the input pattern forward through the network to calculate its activation value.
- Take the difference between desired output and the activation value to calculate the network’s activation error.
- Adjust the weights feeding the output neuron to reduce its activation error for this input pattern.
- Propagate an error value back to each hidden neuron that is proportional to their contribution of the network’s activation error.
- Adjust the weights feeding each hidden neuron to reduce their contribution of error for this input pattern.
- Repeat steps 2 to 7 for each input pattern in the input collection.
- Repeat step 8 until the network is suitably trained.
The magic really happens in step 6, which determines how much error to feed back to each hidden neuron. Once the error value has been established, training can continue as described in my post about the single layer perceptron.
To illustrate how the error value is calculated, I will use this network diagram.
Then, if we use these variables:
output_o = Activation value of the output neuron
error_o = Error at the output neuron
error_h = Error at a hidden neuron
weight_ho = A weight connecting a hidden neuron to the output neuron
The error feed back to a hidden neuron is calculated:
error_h = error_o * Derivative(output_o) * weight_ho
For an explaination about how to calculate the derivative value, see my post on the sigmoid function.
Unlike the single layer Perceptron, training a multi layer Perceptron with back propagation does not guarantee a solution, even if one is available. This is because training can become stuck in a local error minimum. There are a number of strategies to overcome this, which I shall cover another time. For now, restarting training is normally sufficient for small networks.
I will continue to use the same classification problem from my last post.
Representing these input patterns. You should copy and paste these into a Patterns.csv file to use in the code sample below:
The code below is my implementation of back propagation in C#. In my opinion C# Generics are a beautiful thing, and you will see that I use them extensively. To run this code, create a new C# Console application and paste the code into the Program.cs file. Paste the input patterns into a file named Patterns.csv located in your project directory. Include Patterns.csv in your project, and make sure you set its “Copy to Output Directory” attribute to True.
When the application runs, it will use the back propagation algorithm to train the network, restarting if it gets trapped in a local error minimum. Once fully trained, you will be able to test the network against untrained points on the graph and observe that it generalises correctly.
It is simple to modify the code to accept more input dimensions and use more hidden neurons, by changing the variables _inputDims and _hiddenDims. If you try this, make sure to add extra columns to the Patterns.csv file. By doing this, this network can be used to solve very complex problems.
In future posts, I will be using networks like this to solve real world problems - I would welcome suggestions.