In my last post, I discussed the Single Layer Perceptron and its limitations regarding solving linearly non-separable problems. If we are to solve this type of problem, then we need a more sophisticated network, and that means adding layers. However, if we do that, how do we train the network?

Until now, we have used the error observed at the output neuron to adjust the weights between it and the input layer. But, what do we do if we add another layer in between, a hidden layer? Well, we still need to feed this error back through the network in order to adjust the weights, and this is a problem.

Currently, a neuron is either “on” or “off”, which is defined by the threshold activation function we are using.

- public int Function(double x)
- {
- return (x >= 0) ? 1 : -1;
- }

When graphed, it looks like this, with the blue line representing the function and the red line its derivative, which at zero is infinity (not very useful).

Any adjustment we make to the weights feeding the hidden layer is going to be rather course, so we need to smooth this out. We can do that by replacing our simple threshold function with something better. How about this? This is known as a sigmoid function.

- public double Sigmoid(double x)
- {
- return 2 / (1 + Math.Exp(-2 * x)) – 1;
- }

And its derivative.

f'(x) = 1 – f(x)^{2}

- public double Derivative(double x)
- {
- double s = Sigmoid(x);
- return 1 – (Math.Pow(s, 2));
- }

When graphed, it looks like this.

Now, this looks much more useful. There is a smooth and continuous transition between -1 and 1, and the derivative goes nowhere near infinity. Whilst this function is very popular with many neural net developers, I would like to make one further refinement. I like my neurons to be working in the range of 0 to 1, and to do that I need to modify the function slightly.

- public double Sigmoid(double x)
- {
- return 1 / (1 + Math.Exp(-x));
- }

And its derivative.

f'(x) = f(x)(1 – f(x))

- public double Derivative(double x)
- {
- double s = Sigmoid(x);
- return s * (1 – s);
- }

When graphed, it looks like this.

Now, that is perfect, and it has the added benefit of being computationally less demanding, which will be important for larger networks.

In my next post, I will show you how to use the sigmoid activation function to build a Multi Layer Perceptron, trained with the Back Propagation Algorithm.

I welcome comments on this post, and suggestions for future posts.

John