In this post, I continue on the journey to solve linearly non-separable problems using the Multi Layer Perceptron. I will be using the same problem that I introduced last week.
The network I will be using to solve this problem is shown below. If you read my previous post, you will notice that I have introduced a bias to the network. Without a bias, any hyperplane must always pass through the origin, i.e. (0, 0), which is an unnecessary restriction.
The role of the two hidden neurons is divide input space into partitions. In order to achieve this, I have chosen two arbitrary hyperplanes and drawn them on the graph below.
The values in the equations map directly to the weights in the network. However, first the equations must be rearranged with all terms on the same side of the equals sign, i.e. 0 = y + 0.36x - 0.85, etc. Therefore, the weights in the network are as follows:
Using the sigmoid function introduced in my last post, this creates a feature space that looks like this, with h1 along the horizontal axis and h2 along the vertical axis.
As you can see, feature space is a distorted view of input space. Each of the points representing a class clearly fall within a distinct quadrant of the graph, which is evidence that the weights have done their job.
However, this feature space is still linearly non-separable, so the output neuron will still fail in its task of correctly classifying the points. To correct this, we need to modify the sigmoid function, and introduce another variable known as lambda (λ).
Lambda has the effect of steepening the gradient of the sigmoid function, which in turn further distorts input space. By using a lambda value of six, feature space now looks like this.
As you can see, this now results in a linearly separable pattern of points, which the output neuron can correctly classify. Again, the values in the equation map directly to the weights in the network. Therefore, the weights in the final layer of the network are as follows:
Therefore, we now have a network that can correctly classify the points of a linearly non-separable problem.
However, so far, we have done all the work ourselves. How do we get the network to establish correctly the required values of the weights automatically? In order to do that, we need a learning algorithm. In my next post, I will introduce the Back Propagation algorithm and provide a working example written in C#.
In the meantime, if you have any questions on this post, drop me a comment.