- Finding clusters
- Dimensionality reduction
- Finding hidden correlations
- Data compression
The training steps of a SOM are simple.
- Initialise the interconnecting weights with random values.
- Present an input pattern to the network.
- Choose the output neuron with the highest activation (the “winner”).
- Update the weights of neurons that are within the “neighbourhood” of the winner, using a relative learning factor.
- Reduce the learning factor monotonically.
- Reduce the size of the “neighbourhood” monotonically.
- Repeat from step two until only small updates are observed.
To provide an example of what to expect from a SOM, I have prepared a simple example that will attempt to group twenty-five foods into regions of similarity, based on three parameters, which are protein, carbohydrate and fat.
Therefore, the challenge for this SOM is to reduce data containing three dimensions down to two, whilst retaining meaning. It does this by automatically indentifying differentiating features that will have the greatest effect.
The input data is as follows:
After running this data through the SOM, the foods were placed on a 10x10 grid representing their relative similarities. A graphical representation is shown below.
How has the feature map grouped items together, whilst crushing three dimensions into two? Well, a number of zones have formed. Water, which contains no protein, carbs or fat, has been pushed to the bottom right. Directly above in the top right hand corner, sugar, which is made almost entirely of carbs, has taken hold. In the top left corner, butter reigns supreme, being almost entirely fat. Finally, the bottom left is occupied by tuna, which has the highest protein content of the foods in my sample. The remaining foods live between these extremes, with a junk food zone occupying the centre ground.
Below is the C# code, which will allow you to recreate and modify this example. Try adding more input dimensions and see what results. It is not always obvious to see what is happening, especially with a high number of dimensions. However, eventually you will normally be able to spot the, sometimes unexpected, correlation.
As always, let me know what you think.