Semantic Maps & Automated Text Generation

For the past few years I've been developing something I call Athena. Very simply, Athena has the ability to read natural language text documents and create a semantic map from the information contained - this is the dehydration phase. Athena also has the ability to reverse this process and automatically generate (near) natural language text from a semantic map - the rehydration phase.

To explain why this is useful imagine the following. In the real world a cartographer would survey a natural landscape, taking measurements between landmarks and other key features, and compress this information into a two dimensional representation (commonly known as a map). With this map you, as an explorer, are able to create an imaginary representation of that landscape, without ever having been there. Sure, some information is lost during the mapping process, such as the exact species of fauna and flora or weather you may encounter, but using generalisation you can infer what's likely to be there based on previous experience in similar surroundings.

Athena does something analogous to a real world cartographer. However, rather than creating a flat, two dimensional map, Athena stores its knowledge in a large, hyper-dimensional, vector array. Real world landmarks are replaced with key words, phrases and concepts. However, unlike the real world map example, where prior knowledge of fauna and flora is required, Athena holds a generalised set of context information, which allows the rehydration phase to occur.

Athena combines information from thousands of web pages and documents to form a single semantic map. As a new document is added the existing map is modified to accommodate the new information. During this process the map is compressed to form the most efficient representation of knowledge. Similar and related concepts are squeezed together, and duplicate and redundant information squeezed out. And, this is where the magic happens... By reducing the footprint of Athena's semantic map, new knowledge is inferred. Athena will create links between concepts it has not seen in existing documents.

Behind the scenes I’m using SQL Server 2012. The full text engine has been very useful as it contains a stemming algorithm, a list of stop words as well as the obvious full text search capability. Most of the useful functions are contained in sys.dm_fts_parser. On top of SQL, there’s plenty of hand written C#, which implements the semantic map generation. Semantic concepts are represented as vectors and I use cosine similarity to measure similarity between vectors. To establish which terms are important, I use a combination of isolating title case nouns and calculating tf–idf (term frequency–inverse document frequency).

Retrieving information from Athena is as simple as using a search engine.  Just type in a word or phrase and Athena will display the relevant portion of its semantic map. Back in the early days of development I used to play a weird game of two word association with Athena. One day when I typed in "sky tank" and Athena replied with "apache gunship", I knew I was onto something. One of the side effects of knowledge compression is that Athena performs inference between concepts.

The next step was to scale up. I'm now able to load in complete documents, which Athena will summarise, and then append relevant information that it already holds in its semantic map. In a slight twist on the above, another way to use Athena is to type in a "news headline", real or imagined, and Athena will prepare an appropriate news article.

In the example below, I typed in "Google DARPA robotics challenge", as I knew that Google had entered into the recent DARPA robotics challenge. The result is a 2D overview graph representing Athena's knowledge.

Athena Semantic Map

Alongside this Athena generates a text file from which I was able to create a news article. The generated text is not perfect. It's largely a collection of loosely ordered words that have been compressed into Athena's dialect of English. However, it's not much trouble to convert this back into something more human readable.

This is the result after cleaning up manually for a couple of minutes.

  • Google acquires Redwood Robotics and Boston Dynamics.
  • Boston Dynamics makes the Atlas, Wildcat and Big Dog (LS3) robots.
  • Boston Dynamics employs scientists from Carnegie Mellon University.
  • Darpa is the Pentagon’s research and development unit.
  • The Darpa robotics challenge tests machines built to work in disaster areas like Fukushima.
  • The Darpa robotics challenge was held in Miami.
  • Google's Schaft robot wins Darpa challenge.
  • IHMC takes second place in the Darpa challenge.
  • Google Talk is an instant messaging service.
  • Google Reader was an RSS feed aggregator.

There's still a lot of work to do to make Athena into a viable product, but much of the hard work has already been done.


Linear Regression C#

When looking at time series data, such as a stream of prices, it can often be useful to establish a general trend and represent this with a single number. This can be achieved using a linear regression calculation.

Take this series of prices:
4.8, 4.8, 4.5, 3.9, 4.4, 3.6, 3.6, 2.9, 3.5, 3.0, 2.5, 2.2, 2.6, 2.1, 2.2

If you plot on an Excel graph and add a linear trend line, you should get something like this:

We can do the same thing in code:

  1. using System;
  2. class Regression
  3. {
  4.     static void Main(string[] args)
  5.     {
  6.         double[] values = { 4.8, 4.8, 4.5, 3.9, 4.4, 3.6, 3.6, 2.9, 3.5, 3.0, 2.5, 2.2, 2.6, 2.1, 2.2 };
  7.         double xAvg = 0;
  8.         double yAvg = 0;
  9.         for (int x = 0; x < values.Length; x++)
  10.         {
  11.             xAvg += x;
  12.             yAvg += values[x];
  13.         }
  14.         xAvg = xAvg / values.Length;
  15.         yAvg = yAvg / values.Length;
  16.         double v1 = 0;
  17.         double v2 = 0;
  18.         for (int x = 0; x < values.Length; x++)
  19.         {
  20.             v1 += (x - xAvg) * (values[x] - yAvg);
  21.             v2 += Math.Pow(x - xAvg, 2);
  22.         }
  23.         double a = v1 / v2;
  24.         double b = yAvg - a * xAvg;
  25.         Console.WriteLine("y = ax + b");
  26.         Console.WriteLine("a = {0}, the slope of the trend line.", Math.Round(a, 2));
  27.         Console.WriteLine("b = {0}, the intercept of the trend line.", Math.Round(b, 2));
  28.         Console.ReadLine();
  29.     }
  30. }

Now you have the slope of the trend line, this can be used as an input for neural networks analysing time series data. I use something similar in NNATS…

For a complete explanation of linear regression see Wikipedia.

John


Flocking Boids C#

Note: The code supporting this post can be found on GitHub.

In this post, I shall demonstrate an example of swarm intelligence, which is based on the Boids model. In my ecosystem, two different types of Boids exist, namely Regular Boids and Zombie Boids. The Regular Boids are very social, fast moving creatures, which like to flock together. The Zombie Boids are solitary soles, whose single reason for living is to chase after Regular Boids, albeit very slowly.

There is no real problem I am trying to solve here, other than to enjoy watching the carnage that emerges! It would be very easy to add nicer graphics, or even turn this simulation into a screen saver. As usual, the full C# source code to generate this application is listed below.

Before I go into detail about the code, some background on Boids. In 1986, Craig Reynolds developed a computer model that simulated animal movement, such as flocks of birds and schools of fish. He called the simulated flocking creatures Boids.

As with most artificial life simulations, Boids is an example of emergent behaviour. The complexity of Boids arises from the interaction of individual agents complying with a simple set of rules. Reynolds basic flocking model consisted of three simple steering behaviours that determined how individual Boids should manoeuvre based on their velocity and position within the flock:

Separation: steer to avoid crowding local flock-mates
Cohesion: steer to move toward the average position of local flock-mates
Alignment: steer towards the average heading of local flock-mates

To provide even more life like effects, further rules could be added, such as obstacle avoidance and goal seeking.

The flocking algorithm requires that each Boid only react to flock-mates within its local neighbourhood. This neighbourhood is defined by the Boid’s field of view relative to the rest of the flock. Flock-mates outside of this neighbourhood are ignored.

The video below shows an example of the behaviour you can expect to emerge.

 

To implement Boids, I have used three classes, which are EcoSystem, Swarm and Boid. The EcoSystem class contains the entry point of the application, handles the graphics and serves as a container for the swarm. The Swarm class is responsible for creating and managing the entire the collection of Boids, known as the swarm. The Boid class encapsulates all logic pertaining to Boid behaviours, which, in this simulation, include separation, cohesion, alignment, avoidance and hunting.

 

  1. using System;
  2. using System.Collections.Generic;
  3. using System.Drawing;
  4. using System.Drawing.Drawing2D;
  5. using System.Windows.Forms;
  6. public class Boids : Form
  7. {
  8.     private Timer timer;
  9.     private Swarm swarm;
  10.     private Image iconRegular;
  11.     private Image iconZombie;
  12.     [STAThread]
  13.     private static void Main()
  14.     {
  15.         Application.Run(new Boids());
  16.     }
  17.     public Boids()
  18.     {
  19.         int boundary = 640;
  20.         SetStyle(ControlStyles.AllPaintingInWmPaint | ControlStyles.DoubleBuffer, true);
  21.         FormBorderStyle = FormBorderStyle.FixedToolWindow;
  22.         StartPosition = FormStartPosition.CenterScreen;
  23.         ClientSize = new Size(boundary, boundary);
  24.         iconRegular = CreateIcon(Brushes.Blue);
  25.         iconZombie = CreateIcon(Brushes.Red);
  26.         swarm = new Swarm(boundary);
  27.         timer = new Timer();
  28.         timer.Tick += new EventHandler(this.timer_Tick);
  29.         timer.Interval = 75;
  30.         timer.Start();
  31.     }
  32.     protected override void OnPaint(PaintEventArgs e)
  33.     {
  34.         foreach (Boid boid in swarm.Boids)
  35.         {
  36.             float angle;
  37.             if (boid.dX == 0) angle = 90f;
  38.             else angle = (float)(Math.Atan(boid.dY / boid.dX) * 57.3);
  39.             if (boid.dX < 0f) angle += 180f;
  40.             Matrix matrix = new Matrix();
  41.             matrix.RotateAt(angle, boid.Position);
  42.             e.Graphics.Transform = matrix;
  43.             if (boid.Zombie) e.Graphics.DrawImage(iconZombie, boid.Position);
  44.             else e.Graphics.DrawImage(iconRegular, boid.Position);
  45.         }
  46.     }
  47.     private static Image CreateIcon(Brush brush)
  48.     {
  49.         Bitmap icon = new Bitmap(16, 16);
  50.         Graphics g = Graphics.FromImage(icon);
  51.         Point p1 = new Point(0, 16);
  52.         Point p2 = new Point(16, 8);
  53.         Point p3 = new Point(0, 0);
  54.         Point p4 = new Point(5, 8);
  55.         Point[] points = { p1, p2, p3, p4 };
  56.         g.FillPolygon(brush, points);
  57.         return icon;
  58.     }
  59.     private void timer_Tick(object sender, EventArgs e)
  60.     {
  61.         swarm.MoveBoids();
  62.         Invalidate();
  63.     }
  64. }
  65. public class Swarm
  66. {
  67.     public List<Boid> Boids = new List<Boid>();
  68.     public Swarm(int boundary)
  69.     {
  70.         for (int i = 0; i < 15; i++)
  71.         {
  72.             Boids.Add(new Boid((i > 12), boundary));
  73.         }
  74.     }
  75.     public void MoveBoids()
  76.     {
  77.         foreach (Boid boid in Boids)
  78.         {
  79.             boid.Move(Boids);
  80.         }
  81.     }
  82. }
  83. public class Boid
  84. {
  85.     private static Random rnd = new Random();
  86.     private static float border = 100f;
  87.     private static float sight = 75f;
  88.     private static float space = 30f;
  89.     private static float speed = 12f;
  90.     private float boundary;
  91.     public float dX;
  92.     public float dY;
  93.     public bool Zombie;
  94.     public PointF Position;
  95.     public Boid(bool zombie, int boundary)
  96.     {
  97.         Position = new PointF(rnd.Next(boundary), rnd.Next(boundary));
  98.         this.boundary = boundary;
  99.         Zombie = zombie;
  100.     }
  101.     public void Move(List<Boid> boids)
  102.     {
  103.         if (!Zombie) Flock(boids);
  104.         else Hunt(boids);
  105.         CheckBounds();
  106.         CheckSpeed();
  107.         Position.X += dX;
  108.         Position.Y += dY;
  109.     }
  110.     private void Flock(List<Boid> boids)
  111.     {
  112.         foreach (Boid boid in boids)
  113.         {
  114.             float distance = Distance(Position, boid.Position);
  115.             if (boid != this && !boid.Zombie)
  116.             {
  117.                 if (distance < space)
  118.                 {
  119.                     // Create space.
  120.                     dX += Position.X - boid.Position.X;
  121.                     dY += Position.Y - boid.Position.Y;
  122.                 }
  123.                 else if (distance < sight)
  124.                 {
  125.                     // Flock together.
  126.                     dX += (boid.Position.X - Position.X) * 0.05f;
  127.                     dY += (boid.Position.Y - Position.Y) * 0.05f;
  128.                 }
  129.                 if (distance < sight)
  130.                 {
  131.                     // Align movement.
  132.                     dX += boid.dX * 0.5f;
  133.                     dY += boid.dY * 0.5f;
  134.                 }
  135.             }
  136.             if (boid.Zombie && distance < sight)
  137.             {
  138.                 // Avoid zombies.
  139.                 dX += Position.X - boid.Position.X;
  140.                 dY += Position.Y - boid.Position.Y;
  141.             }
  142.         }
  143.     }
  144.     private void Hunt(List<Boid> boids)
  145.     {
  146.         float range = float.MaxValue;
  147.         Boid prey = null;
  148.         foreach (Boid boid in boids)
  149.         {
  150.             if (!boid.Zombie)
  151.             {
  152.                 float distance = Distance(Position, boid.Position);
  153.                 if (distance < sight && distance < range)
  154.                 {
  155.                     range = distance;
  156.                     prey = boid;
  157.                 }
  158.             }
  159.         }
  160.         if (prey != null)
  161.         {
  162.             // Move towards closest prey.
  163.             dX += prey.Position.X - Position.X;
  164.             dY += prey.Position.Y - Position.Y;
  165.         }
  166.     }
  167.     private static float Distance(PointF p1, PointF p2)
  168.     {
  169.         double val = Math.Pow(p1.X - p2.X, 2) + Math.Pow(p1.Y - p2.Y, 2);
  170.         return (float)Math.Sqrt(val);
  171.     }
  172.     private void CheckBounds()
  173.     {
  174.         float val = boundary - border;
  175.         if (Position.X < border) dX += border - Position.X;
  176.         if (Position.Y < border) dY += border - Position.Y;
  177.         if (Position.X > val) dX += val - Position.X;
  178.         if (Position.Y > val) dY += val - Position.Y;
  179.     }
  180.     private void CheckSpeed()
  181.     {
  182.         float s;
  183.         if (!Zombie) s = speed;
  184.         else s = speed / 4f;
  185.         float val = Distance(new PointF(0f, 0f), new PointF(dX, dY));
  186.         if (val > s)
  187.         {
  188.             dX = dX * s / val;
  189.             dY = dY * s / val;
  190.         }
  191.     }
  192. }

I hope you enjoy Boids and much as I do – I find them strangely addictive to watch.

 


Self Organising Map C#

In this post, I shall be discussing Self Organising Maps (SOM), which are also known as Kohonen maps or topographical maps. Until now, all of my posts have focussed on supervised learning, i.e. we present a set of patterns to a neural network along with the desired outputs. The network then adjusts its weights in order to recreate the correct outputs. However, suppose we do not know in advance what the outputs should be? How can we get the network to do something useful? This is called unsupervised learning. Some reasons we may choose to do this are:

  • Finding clusters
  • Dimensionality reduction
  • Finding hidden correlations
  • Data compression
  • Classification

A SOM is a two layer neural network that accepts N-dimensional input patterns and maps them to a lattice of output neurons, which represents feature space. The output map, which is typically two-dimensional, characterises the relative position of neurons with regards their neighbours, i.e. their topological properties rather than exact geometric locations. All input neurons are fully connected to all output neurons.

The training steps of a SOM are simple.

  1. Initialise the interconnecting weights with random values.
  2. Present an input pattern to the network.
  3. Choose the output neuron with the highest activation (the “winner”).
  4. Update the weights of neurons that are within the “neighbourhood” of the winner, using a relative learning factor.
  5. Reduce the learning factor monotonically.
  6. Reduce the size of the “neighbourhood” monotonically.
  7. Repeat from step two until only small updates are observed.

The spread of the neighbourhood function will initially include all neurons on the grid, gradually reducing to include only the winning neuron.

To provide an example of what to expect from a SOM, I have prepared a simple example that will attempt to group twenty-five foods into regions of similarity, based on three parameters, which are protein, carbohydrate and fat.

Therefore, the challenge for this SOM is to reduce data containing three dimensions down to two, whilst retaining meaning. It does this by automatically indentifying differentiating features that will have the greatest effect.

The input data is as follows:

Item,protein,carb,fat
Apples,0.4,11.8,0.1
Avocado,1.9,1.9,19.5
Bananas,1.2,23.2,0.3
Beef Steak,20.9,0.0,7.9
Big Mac,13.0,19.0,11.0
Brazil Nuts,15.5,2.9,68.3
Bread,10.5,37.0,3.2
Butter,1.0,0.0,81.0
Cheese,25.0,0.1,34.4
Cheesecake,6.4,28.2,22.7
Cookies,5.7,58.7,29.3
Cornflakes,7.0,84.0,0.9
Eggs,12.5,0.0,10.8
Fried Chicken,17.0,7.0,20.0
Fries,3.0,36.0,13.0
Hot Chocolate,3.8,19.4,10.2
Pepperoni,20.9,5.1,38.3
Pizza,12.5,30.0,11.0
Pork Pie,10.1,27.3,24.2
Potatoes,1.7,16.1,0.3
Rice,6.9,74.0,2.8
Roast Chicken,26.1,0.3,5.8
Sugar,0.0,95.1,0.0
Tuna Steak,25.6,0.0,0.5
Water,0.0,0.0,0.0

After running this data through the SOM, the foods were placed on a 10x10 grid representing their relative similarities. A graphical representation is shown below.

How has the feature map grouped items together, whilst crushing three dimensions into two? Well, a number of zones have formed. Water, which contains no protein, carbs or fat, has been pushed to the bottom right. Directly above in the top right hand corner, sugar, which is made almost entirely of carbs, has taken hold. In the top left corner, butter reigns supreme, being almost entirely fat. Finally, the bottom left is occupied by tuna, which has the highest protein content of the foods in my sample. The remaining foods live between these extremes, with a junk food zone occupying the centre ground.

Below is the C# code, which will allow you to recreate and modify this example. Try adding more input dimensions and see what results. It is not always obvious to see what is happening, especially with a high number of dimensions. However, eventually you will normally be able to spot the, sometimes unexpected, correlation.

  1. using System;
  2. using System.Collections.Generic;
  3. using System.IO;
  4. public class Map
  5. {
  6.     private Neuron[,] outputs;  // Collection of weights.
  7.     private int iteration;      // Current iteration.
  8.     private int length;        // Side length of output grid.
  9.     private int dimensions;    // Number of input dimensions.
  10.     private Random rnd = new Random();
  11.     private List<string> labels = new List<string>();
  12.     private List<double[]> patterns = new List<double[]>();
  13.     static void Main(string[] args)
  14.     {
  15.         new Map(3, 10, "Food.csv");
  16.         Console.ReadLine();
  17.     }
  18.     public Map(int dimensions, int length, string file)
  19.     {
  20.         this.length = length;
  21.         this.dimensions = dimensions;
  22.         Initialise();
  23.         LoadData(file);
  24.         NormalisePatterns();
  25.         Train(0.0000001);
  26.         DumpCoordinates();
  27.     }
  28.     private void Initialise()
  29.     {
  30.         outputs = new Neuron[length, length];
  31.         for (int i = 0; i < length; i++)
  32.         {
  33.             for (int j = 0; j < length; j++)
  34.             {
  35.                 outputs[i, j] = new Neuron(i, j, length);
  36.                 outputs[i, j].Weights = new double[dimensions];
  37.                 for (int k = 0; k < dimensions; k++)
  38.                 {
  39.                     outputs[i, j].Weights[k] = rnd.NextDouble();
  40.                 }
  41.             }
  42.         }
  43.     }
  44.     private void LoadData(string file)
  45.     {
  46.         StreamReader reader = File.OpenText(file);
  47.         reader.ReadLine(); // Ignore first line.
  48.         while (!reader.EndOfStream)
  49.         {
  50.             string[] line = reader.ReadLine().Split(',');
  51.             labels.Add(line[0]);
  52.             double[] inputs = new double[dimensions];
  53.             for (int i = 0; i < dimensions; i++)
  54.             {
  55.                 inputs[i] = double.Parse(line[i + 1]);
  56.             }
  57.             patterns.Add(inputs);
  58.         }
  59.         reader.Close();
  60.     }
  61.     private void NormalisePatterns()
  62.     {
  63.         for (int j = 0; j < dimensions; j++)
  64.         {
  65.             double sum = 0;
  66.             for (int i = 0; i < patterns.Count; i++)
  67.             {
  68.                 sum += patterns[i][j];
  69.             }
  70.             double average = sum / patterns.Count;
  71.             for (int i = 0; i < patterns.Count; i++)
  72.             {
  73.                 patterns[i][j] = patterns[i][j] / average;
  74.             }
  75.         }
  76.     }
  77.     private void Train(double maxError)
  78.     {
  79.         double currentError = double.MaxValue;
  80.         while (currentError > maxError)
  81.         {
  82.             currentError = 0;
  83.             List<double[]> TrainingSet = new List<double[]>();
  84.             foreach (double[] pattern in patterns)
  85.             {
  86.                 TrainingSet.Add(pattern);
  87.             }
  88.             for (int i = 0; i < patterns.Count; i++)
  89.             {
  90.                 double[] pattern = TrainingSet[rnd.Next(patterns.Count - i)];
  91.                 currentError += TrainPattern(pattern);
  92.                 TrainingSet.Remove(pattern);
  93.             }
  94.             Console.WriteLine(currentError.ToString("0.0000000"));
  95.         }
  96.     }
  97.     private double TrainPattern(double[] pattern)
  98.     {
  99.         double error = 0;
  100.         Neuron winner = Winner(pattern);
  101.         for (int i = 0; i < length; i++)
  102.         {
  103.             for (int j = 0; j < length; j++)
  104.             {
  105.                 error += outputs[i, j].UpdateWeights(pattern, winner, iteration);
  106.             }
  107.         }
  108.         iteration++;
  109.         return Math.Abs(error / (length * length));
  110.     }
  111.     private void DumpCoordinates()
  112.     {
  113.         for (int i = 0; i < patterns.Count; i++)
  114.         {
  115.             Neuron n = Winner(patterns[i]);
  116.             Console.WriteLine("{0},{1},{2}", labels[i], n.X, n.Y);
  117.         }
  118.     }
  119.     private Neuron Winner(double[] pattern)
  120.     {
  121.         Neuron winner = null;
  122.         double min = double.MaxValue;
  123.         for (int i = 0; i < length; i++)
  124.             for (int j = 0; j < length; j++)
  125.             {
  126.                 double d = Distance(pattern, outputs[i, j].Weights);
  127.                 if (d < min)
  128.                 {
  129.                     min = d;
  130.                     winner = outputs[i, j];
  131.                 }
  132.             }
  133.         return winner;
  134.     }
  135.     private double Distance(double[] vector1, double[] vector2)
  136.     {
  137.         double value = 0;
  138.         for (int i = 0; i < vector1.Length; i++)
  139.         {
  140.             value += Math.Pow((vector1[i] - vector2[i]), 2);
  141.         }
  142.         return Math.Sqrt(value);
  143.     }
  144. }
  145. public class Neuron
  146. {
  147.     public double[] Weights;
  148.     public int X;
  149.     public int Y;
  150.     private int length;
  151.     private double nf;
  152.     public Neuron(int x, int y, int length)
  153.     {
  154.         X = x;
  155.         Y = y;
  156.         this.length = length;
  157.         nf = 1000 / Math.Log(length);
  158.     }
  159.     private double Gauss(Neuron win, int it)
  160.     {
  161.         double distance = Math.Sqrt(Math.Pow(win.X - X, 2) + Math.Pow(win.Y - Y, 2));
  162.         return Math.Exp(-Math.Pow(distance, 2) / (Math.Pow(Strength(it), 2)));
  163.     }
  164.     private double LearningRate(int it)
  165.     {
  166.         return Math.Exp(-it / 1000) * 0.1;
  167.     }
  168.     private double Strength(int it)
  169.     {
  170.         return Math.Exp(-it / nf) * length;
  171.     }
  172.     public double UpdateWeights(double[] pattern, Neuron winner, int it)
  173.     {
  174.         double sum = 0;
  175.         for (int i = 0; i < Weights.Length; i++)
  176.         {
  177.             double delta = LearningRate(it) * Gauss(winner, it) * (pattern[i] - Weights[i]);
  178.             Weights[i] += delta;
  179.             sum += delta;
  180.         }
  181.         return sum / Weights.Length;
  182.     }
  183. }

As always, let me know what you think.

John


Back Propagation Multi-Output Neural Network C#

In this post, I publish an updated version of my multi-layer Perceptron - new features and improvements include:

  • A graphical representation of network activity
  • The ability to handle multiple network outputs
  • The ability to define any network architecture in a single line of code
  • A cleaner separation of responsibility within the code

In order to put this new network through its paces, I downloaded the classic Iris Plants Database from the UCI Machine Learning Repository. The data set contains three classes of fifty instances each, where each class refers to a type of iris plant. One class is linearly separable from the other two; the latter are not linearly separable from each other. To use this data, save the web page as a text file with a csv file extension. Open with Excel and add three new columns, on the right, filling with 0,0,1 or 0,1,0 or 1,0,0 depending on the species of plant. Delete the species name column and you now have a file which can be fed into your network. The video below shows the graphical output and clearly displays how the network converges on a solution.

 

There is quite a lot of code so I will cover each class individually.

Unlike my previous back propagation project, this one is a Windows Forms Application, so in Visual Studio begin by creating a new project and delete Form1.cs and Program.cs, which were automatically created for you. Create a new class file for each of the code blocks below.

Graph.cs
This is the container for the project and provides a graphical display of how the network is proceeding with training. In order to use the graphing feature, the penultimate layer of your network must contain two dimensions, which is not as restrictive as is sounds. I have found that being able to visualise network activity helps enormously when tuning your network. Things to look out for are inactivity, meaning the network has become stuck in a local minimum, or erratic activity meaning your learning rate is too high. If at any time you want to restart training, simply hit the space key and the network will reset. I guarantee that once you have tried this, you will never go back to gazing at streams of numbers…

  1. using System;
  2. using System.Drawing;
  3. using System.Windows.Forms;
  4. public class Graph : Form
  5. {
  6.     private int iteration;
  7.     private Network network;
  8.     private Brush[] brushes = { Brushes.Red, Brushes.Green, Brushes.Blue };
  9.     [STAThread]
  10.     static void Main()
  11.     {
  12.         Application.Run(new Graph());
  13.     }
  14.     public Graph()
  15.     {
  16.         int[] dims = { 4, 4, 2, 3 };    // Replace with your network dimensions.
  17.         string file = "iris_data.csv";  // Replace with your input file location.
  18.         network = new Network(dims, file);
  19.         Initialise();
  20.     }
  21.     private void Initialise()
  22.     {
  23.         ClientSize = new Size(400, 400);
  24.         SetStyle(ControlStyles.AllPaintingInWmPaint | ControlStyles.Opaque, true);
  25.         FormBorderStyle = FormBorderStyle.FixedToolWindow;
  26.         StartPosition = FormStartPosition.CenterScreen;
  27.         KeyDown += new KeyEventHandler(Graph_KeyDown);
  28.     }
  29.     protected override void OnPaint(PaintEventArgs e)
  30.     {
  31.         double error = network.Train();
  32.         UpdatePlotArea(e.Graphics, error);
  33.         iteration++;
  34.         Invalidate();
  35.     }
  36.     private void UpdatePlotArea(Graphics g, double error)
  37.     {
  38.         string banner = "Iteration={0}  Error={1:0.00}";
  39.         Text = string.Format(banner, iteration, error);
  40.         g.FillRectangle(Brushes.White, 0, 0, 400, 400);
  41.         foreach (float[] point in network.Points2D())
  42.         {
  43.             g.FillRectangle(brushes[(int)point[2]], point[0] * 395, point[1] * 395, 5, 5);
  44.         }
  45.         foreach (double[] line in network.HyperPlanes())
  46.         {
  47.             double a = -line[0] / line[1];
  48.             double c = -line[2] / line[1];
  49.             Point left = new Point(0, (int)(c * 400));
  50.             Point right = new Point(400, (int)((a + c) * 400));
  51.             g.DrawLine(new Pen(Color.Gray), left, right);
  52.         }
  53.     }
  54.     private void Graph_KeyDown(object sender, KeyEventArgs e)
  55.     {
  56.         if (e.KeyCode == Keys.Space)
  57.         {
  58.             network.Initialise();
  59.             iteration = 0;
  60.         }
  61.     }
  62. }

Network.cs
As the name suggests, this is where the logic of the network resides. The big difference here between this and my previous project is that it can handle multiple outputs. In addition, the new constructor provides the ability to define any network architecture using an integer array. Thus, { 4, 4, 2, 3 } represents a network with four inputs, a first hidden layer with four neurons, a second hidden layer with two neurons, and a final output layer with three neurons.

  1. using System;
  2. using System.Collections.Generic;
  3. using System.IO;
  4. public class Network : List<Layer>
  5. {
  6.     private int[] dimensions;
  7.     private List<Pattern> patterns;
  8.     public Network(int[] dimensions, string file)
  9.     {
  10.         this.dimensions = dimensions;
  11.         Initialise();
  12.         LoadPatterns(file);
  13.     }
  14.     public void Initialise()
  15.     {
  16.         base.Clear();
  17.         base.Add(new Layer(dimensions[0]));
  18.         for (int i = 1; i < dimensions.Length; i++)
  19.         {
  20.             base.Add(new Layer(dimensions[i], base[i - 1], new Random()));
  21.         }
  22.     }
  23.     private void LoadPatterns(string file)
  24.     {
  25.         patterns = new List<Pattern>();
  26.         StreamReader reader = File.OpenText(file);
  27.         while (!reader.EndOfStream)
  28.         {
  29.             string line = reader.ReadLine();
  30.             patterns.Add(new Pattern(line, Inputs.Count, Outputs.Count));
  31.         }
  32.         reader.Close();
  33.     }
  34.     public double Train()
  35.     {
  36.         double error = 0;
  37.         foreach (Pattern pattern in patterns)
  38.         {
  39.             Activate(pattern);
  40.             for (int i = 0; i < Outputs.Count; i++)
  41.             {
  42.                 double delta = pattern.Outputs[i] - Outputs[i].Output;
  43.                 Outputs[i].CollectError(delta);
  44.                 error += Math.Pow(delta, 2);
  45.             }
  46.             AdjustWeights();
  47.         }
  48.         return error;
  49.     }
  50.     private void Activate(Pattern pattern)
  51.     {
  52.         for (int i = 0; i < Inputs.Count; i++)
  53.         {
  54.             Inputs[i].Output = pattern.Inputs[i];
  55.         }
  56.         for (int i = 1; i < base.Count; i++)
  57.         {
  58.             foreach (Neuron neuron in base[i])
  59.             {
  60.                 neuron.Activate();
  61.             }
  62.         }
  63.     }
  64.     private void AdjustWeights()
  65.     {
  66.         for (int i = base.Count - 1; i > 0; i--)
  67.         {
  68.             foreach (Neuron neuron in base[i])
  69.             {
  70.                 neuron.AdjustWeights();
  71.             }
  72.         }
  73.     }
  74.     public List<double[]> HyperPlanes()
  75.     {
  76.         List<double[]> lines = new List<double[]>();
  77.         foreach (Neuron n in Outputs)
  78.         {
  79.             lines.Add(n.HyperPlane);
  80.         }
  81.         return lines;
  82.     }
  83.     public List<float[]> Points2D()
  84.     {
  85.         int penultimate = base.Count - 2;
  86.         if (base[penultimate].Count != 2)
  87.         {
  88.             throw new Exception("Penultimate layer must be 2D for graphing.");
  89.         }
  90.         List<float[]> points = new List<float[]>();
  91.         for (int i = 0; i < patterns.Count; i++)
  92.         {
  93.             Activate(patterns[i]);
  94.             float[] point = new float[3];
  95.             point[0] = (float)base[penultimate][0].Output;
  96.             point[1] = (float)base[penultimate][1].Output;
  97.             if (Outputs.Count > 1)
  98.             {
  99.                 point[2] = patterns[i].MaxOutput;
  100.             }
  101.             else
  102.             {
  103.                 point[2] = (patterns[i].Outputs[0] >= 0.5) ? 1 : 0;
  104.             }
  105.             points.Add(point);
  106.         }
  107.         return points;
  108.     }
  109.     private Layer Inputs
  110.     {
  111.         get { return base[0]; }
  112.     }
  113.     private Layer Outputs
  114.     {
  115.         get { return base[base.Count - 1]; }
  116.     }
  117. }

Neuron.cs
This class represents the neuron, and contains activation and training logic.

  1. using System;
  2. using System.Collections.Generic;
  3. public class Neuron
  4. {
  5.     private double bias;                       // Bias value.
  6.     private double error;                      // Sum of error.
  7.     private double input;                      // Sum of inputs.
  8.     private double gradient = 5;               // Steepness of sigmoid curve.
  9.     private double learnRate = 0.01;           // Learning rate.
  10.     private double output = double.MinValue;   // Preset value of neuron.
  11.     private List<Weight> weights;              // Collection of weights to inputs.
  12.     public Neuron() { }
  13.     public Neuron(Layer inputs, Random rnd)
  14.     {
  15.         weights = new List<Weight>();
  16.         foreach (Neuron input in inputs)
  17.         {
  18.             Weight w = new Weight();
  19.             w.Input = input;
  20.             w.Value = rnd.NextDouble() * 2 - 1;
  21.             weights.Add(w);
  22.         }
  23.     }
  24.     public void Activate()
  25.     {
  26.         error = 0;
  27.         input = 0;
  28.         foreach (Weight w in weights)
  29.         {
  30.             input += w.Value * w.Input.Output;
  31.         }
  32.     }
  33.     public void CollectError(double delta)
  34.     {
  35.         if (weights != null)
  36.         {
  37.             error += delta;
  38.             foreach (Weight w in weights)
  39.             {
  40.                 w.Input.CollectError(error * w.Value);
  41.             }
  42.         }
  43.     }
  44.     public void AdjustWeights()
  45.     {
  46.         for (int i = 0; i < weights.Count; i++)
  47.         {
  48.             weights[i].Value += error * Derivative * learnRate * weights[i].Input.Output;
  49.         }
  50.         bias += error * Derivative * learnRate;
  51.     }
  52.     private double Derivative
  53.     {
  54.         get
  55.         {
  56.             double activation = Output;
  57.             return activation * (1 - activation);
  58.         }
  59.     }
  60.     public double Output
  61.     {
  62.         get
  63.         {
  64.             if (output != double.MinValue)
  65.             {
  66.                 return output;
  67.             }
  68.             return 1 / (1 + Math.Exp(-gradient * (input + bias)));
  69.         }
  70.         set
  71.         {
  72.             output = value;
  73.         }
  74.     }
  75.     public double[] HyperPlane
  76.     {
  77.         get
  78.         {
  79.             double[] line = new double[3];
  80.             line[0] = weights[0].Value;
  81.             line[1] = weights[1].Value;
  82.             line[2] = bias;
  83.             return line;
  84.         }
  85.     }
  86. }

Layer.cs
This class inherits from List<Neuron> and simply holds a collection of neurons representing a discrete layer within the network.

  1. using System;
  2. using System.Collections.Generic;
  3. public class Layer : List<Neuron>
  4. {
  5.     public Layer(int size)
  6.     {
  7.         for (int i = 0; i < size; i++)
  8.             base.Add(new Neuron());
  9.     }
  10.     public Layer(int size, Layer layer, Random rnd)
  11.     {
  12.         for (int i = 0; i < size; i++)
  13.             base.Add(new Neuron(layer, rnd));
  14.     }
  15. }

Weight.cs
This class holds a pointer to a contributing neuron and the associated weight value.

  1. public class Weight
  2. {
  3.     public Neuron Input;
  4.     public double Value;
  5. }

Pattern.cs
This class holds a single line of data from your input file in a form that can be consumed by the network.

  1. using System;
  2. public class Pattern
  3. {
  4.     private double[] inputs;
  5.     private double[] outputs;
  6.     public Pattern(string value, int inputDims, int outputDims)
  7.     {
  8.         string[] line = value.Split(',');
  9.         if (line.Length != inputDims + outputDims)
  10.             throw new Exception("Input does not match network configuration");
  11.         inputs = new double[inputDims];
  12.         for (int i = 0; i < inputDims; i++)
  13.         {
  14.             inputs[i] = double.Parse(line[i]);
  15.         }
  16.         outputs = new double[outputDims];
  17.         for (int i = 0; i < outputDims; i++)
  18.         {
  19.             outputs[i] = double.Parse(line[i + inputDims]);
  20.         }
  21.     }
  22.     public int MaxOutput
  23.     {
  24.         get
  25.         {
  26.             int item = -1;
  27.             double max = double.MinValue;
  28.             for (int i = 0; i < Outputs.Length; i++)
  29.             {
  30.                 if (Outputs[i] > max)
  31.                 {
  32.                     max = Outputs[i];
  33.                     item = i;
  34.                 }
  35.             }
  36.             return item;
  37.         }
  38.     }
  39.     public double[] Inputs
  40.     {
  41.         get { return inputs; }
  42.     }
  43.     public double[] Outputs
  44.     {
  45.         get { return outputs; }
  46.     }
  47. }

I hope you find this network useful. If you have any question or suggestions, I would be happy to hear from you.


Train Neural Networks with Back Propagation C#

In my previous post, I showed how multi layer Perceptrons could be used to solve linearly non-separable problems. In that example, I calculated all the network weights by hand. In this post, I shall introduce the back propagation training algorithm, which will do all the hard work for us.

The back propagation algorithm has become the de facto training algorithm for artificial neural networks, and has been studied by the artificial intelligence community since the 1970s. It is used in commercial neural network software packages, such a MATLAB.

The principle of back propagation is actually quite easy to understand, even though the maths behind it can look rather daunting. The basic steps are:

  1. Initialise the network with small random weights.
  2. Present an input pattern to the input layer of the network.
  3. Feed the input pattern forward through the network to calculate its activation value.
  4. Take the difference between desired output and the activation value to calculate the network’s activation error.
  5. Adjust the weights feeding the output neuron to reduce its activation error for this input pattern.
  6. Propagate an error value back to each hidden neuron that is proportional to their contribution of the network’s activation error.
  7. Adjust the weights feeding each hidden neuron to reduce their contribution of error for this input pattern.
  8. Repeat steps 2 to 7 for each input pattern in the input collection.
  9. Repeat step 8 until the network is suitably trained.

It is important to note that each pattern is presented in turn, and the network adjusted slightly, before moving on to the next pattern. If we simply let the network perfectly correct the errors before moving onto the next pattern, it would never learn a generalised solution for the entire input collection.

The magic really happens in step 6, which determines how much error to feed back to each hidden neuron. Once the error value has been established, training can continue as described in my post about the single layer perceptron.

To illustrate how the error value is calculated, I will use this network diagram.

Multi Layer Perceptron

Then, if we use these variables:

output_o = Activation value of the output neuron
error_o = Error at the output neuron
error_h = Error at a hidden neuron
weight_ho = A weight connecting a hidden neuron to the output neuron

The error feed back to a hidden neuron is calculated:

error_h = error_o * Derivative(output_o) * weight_ho

For an explaination about how to calculate the derivative value, see my post on the sigmoid function.

Unlike the single layer Perceptron, training a multi layer Perceptron with back propagation does not guarantee a solution, even if one is available. This is because training can become stuck in a local error minimum. There are a number of strategies to overcome this, which I shall cover another time. For now, restarting training is normally sufficient for small networks.

I will continue to use the same classification problem from my last post.

Linearly Non-Separable

Representing these input patterns. You should copy and paste these into a Patterns.csv file to use in the code sample below:

0.10, 0.03, 0
0.11, 0.11, 0
0.11, 0.82, 0
0.13, 0.17, 0
0.20, 0.81, 0
0.21, 0.57, 1
0.25, 0.52, 1
0.26, 0.48, 1
0.28, 0.17, 1
0.28, 0.45, 1
0.37, 0.28, 1
0.41, 0.92, 0
0.43, 0.04, 1
0.44, 0.55, 1
0.47, 0.84, 0
0.50, 0.36, 1
0.51, 0.96, 0
0.56, 0.62, 1
0.65, 0.01, 1
0.67, 0.50, 1
0.73, 0.05, 1
0.73, 0.90, 0
0.73, 0.99, 0
0.78, 0.01, 1
0.83, 0.62, 0
0.86, 0.42, 1
0.86, 0.91, 0
0.89, 0.12, 1
0.95, 0.15, 1
0.98, 0.73, 0

The code below is my implementation of back propagation in C#. In my opinion C# Generics are a beautiful thing, and you will see that I use them extensively. To run this code, create a new C# Console application and paste the code into the Program.cs file. Paste the input patterns into a file named Patterns.csv located in your project directory. Include Patterns.csv in your project, and make sure you set its “Copy to Output Directory” attribute to True.

  1. using System;
  2. using System.Collections.Generic;
  3. using System.IO;
  4. public class Network
  5. {
  6.     private int _hiddenDims = 2;        // Number of hidden neurons.
  7.     private int _inputDims = 2;        // Number of input neurons.
  8.     private int _iteration;            // Current training iteration.
  9.     private int _restartAfter = 2000;   // Restart training if iterations exceed this.
  10.     private Layer _hidden;              // Collection of hidden neurons.
  11.     private Layer _inputs;              // Collection of input neurons.
  12.     private List<Pattern> _patterns;    // Collection of training patterns.
  13.     private Neuron _output;            // Output neuron.
  14.     private Random _rnd = new Random(); // Global random number generator.
  15.     [STAThread]
  16.     static void Main()
  17.     {
  18.         new Network();
  19.     }
  20.     public Network()
  21.     {
  22.         LoadPatterns();
  23.         Initialise();
  24.         Train();
  25.         Test();
  26.     }
  27.     private void Train()
  28.     {
  29.         double error;
  30.         do
  31.         {
  32.             error = 0;
  33.             foreach (Pattern pattern in _patterns)
  34.             {
  35.                 double delta = pattern.Output - Activate(pattern);
  36.                 AdjustWeights(delta);
  37.                 error += Math.Pow(delta, 2);
  38.             }
  39.             Console.WriteLine("Iteration {0}tError {1:0.000}", _iteration, error);
  40.             _iteration++;
  41.             if (_iteration > _restartAfter) Initialise();
  42.         } while (error > 0.1);
  43.     }
  44.     private void Test()
  45.     {
  46.         Console.WriteLine("nBegin network testingnPress Ctrl C to exitn");
  47.         while (1 == 1)
  48.         {
  49.             try
  50.             {
  51.                 Console.Write("Input x, y: ");
  52.                 string values = Console.ReadLine() + ",0";
  53.                 Console.WriteLine("{0:0}n", Activate(new Pattern(values, _inputDims)));
  54.             }
  55.             catch (Exception e)
  56.             {
  57.                 Console.WriteLine(e.Message);
  58.             }
  59.         }
  60.     }
  61.     private double Activate(Pattern pattern)
  62.     {
  63.         for (int i = 0; i < pattern.Inputs.Length; i++)
  64.         {
  65.             _inputs[i].Output = pattern.Inputs[i];
  66.         }
  67.         foreach (Neuron neuron in _hidden)
  68.         {
  69.             neuron.Activate();
  70.         }
  71.         _output.Activate();
  72.         return _output.Output;
  73.     }
  74.     private void AdjustWeights(double delta)
  75.     {
  76.         _output.AdjustWeights(delta);
  77.         foreach (Neuron neuron in _hidden)
  78.         {
  79.             neuron.AdjustWeights(_output.ErrorFeedback(neuron));
  80.         }
  81.     }
  82.     private void Initialise()
  83.     {
  84.         _inputs = new Layer(_inputDims);
  85.         _hidden = new Layer(_hiddenDims, _inputs, _rnd);
  86.         _output = new Neuron(_hidden, _rnd);
  87.         _iteration = 0;
  88.         Console.WriteLine("Network Initialised");
  89.     }
  90.     private void LoadPatterns()
  91.     {
  92.         _patterns = new List<Pattern>();
  93.         StreamReader file = File.OpenText("Patterns.csv");
  94.         while (!file.EndOfStream)
  95.         {
  96.             string line = file.ReadLine();
  97.             _patterns.Add(new Pattern(line, _inputDims));
  98.         }
  99.         file.Close();
  100.     }
  101. }
  102. public class Layer : List<Neuron>
  103. {
  104.     public Layer(int size)
  105.     {
  106.         for (int i = 0; i < size; i++)
  107.             base.Add(new Neuron());
  108.     }
  109.     public Layer(int size, Layer layer, Random rnd)
  110.     {
  111.         for (int i = 0; i < size; i++)
  112.             base.Add(new Neuron(layer, rnd));
  113.     }
  114. }
  115. public class Neuron
  116. {
  117.     private double _bias;                       // Bias value.
  118.     private double _error;                      // Sum of error.
  119.     private double _input;                      // Sum of inputs.
  120.     private double _lambda = 6;                // Steepness of sigmoid curve.
  121.     private double _learnRate = 0.5;            // Learning rate.
  122.     private double _output = double.MinValue;   // Preset value of neuron.
  123.     private List<Weight> _weights;              // Collection of weights to inputs.
  124.     public Neuron() { }
  125.     public Neuron(Layer inputs, Random rnd)
  126.     {
  127.         _weights = new List<Weight>();
  128.         foreach (Neuron input in inputs)
  129.         {
  130.             Weight w = new Weight();
  131.             w.Input = input;
  132.             w.Value = rnd.NextDouble() * 2 - 1;
  133.             _weights.Add(w);
  134.         }
  135.     }
  136.     public void Activate()
  137.     {
  138.         _input = 0;
  139.         foreach (Weight w in _weights)
  140.         {
  141.             _input += w.Value * w.Input.Output;
  142.         }
  143.     }
  144.     public double ErrorFeedback(Neuron input)
  145.     {
  146.         Weight w = _weights.Find(delegate(Weight t) { return t.Input == input; });
  147.         return _error * Derivative * w.Value;
  148.     }
  149.     public void AdjustWeights(double value)
  150.     {
  151.         _error = value;
  152.         for (int i = 0; i < _weights.Count; i++)
  153.         {
  154.             _weights[i].Value += _error * Derivative * _learnRate * _weights[i].Input.Output;
  155.         }
  156.         _bias += _error * Derivative * _learnRate;
  157.     }
  158.     private double Derivative
  159.     {
  160.         get
  161.         {
  162.             double activation = Output;
  163.             return activation * (1 - activation);
  164.         }
  165.     }
  166.     public double Output
  167.     {
  168.         get
  169.         {
  170.             if (_output != double.MinValue)
  171.             {
  172.                 return _output;
  173.             }
  174.             return 1 / (1 + Math.Exp(-_lambda * (_input + _bias)));
  175.         }
  176.         set
  177.         {
  178.             _output = value;
  179.         }
  180.     }
  181. }
  182. public class Pattern
  183. {
  184.     private double[] _inputs;
  185.     private double _output;
  186.     public Pattern(string value, int inputSize)
  187.     {
  188.         string[] line = value.Split(',');
  189.         if (line.Length - 1 != inputSize)
  190.             throw new Exception("Input does not match network configuration");
  191.         _inputs = new double[inputSize];
  192.         for (int i = 0; i < inputSize; i++)
  193.         {
  194.             _inputs[i] = double.Parse(line[i]);
  195.         }
  196.         _output = double.Parse(line[inputSize]);
  197.     }
  198.     public double[] Inputs
  199.     {
  200.         get { return _inputs; }
  201.     }
  202.     public double Output
  203.     {
  204.         get { return _output; }
  205.     }
  206. }
  207. public class Weight
  208. {
  209.     public Neuron Input;
  210.     public double Value;
  211. }

When the application runs, it will use the back propagation algorithm to train the network, restarting if it gets trapped in a local error minimum. Once fully trained, you will be able to test the network against untrained points on the graph and observe that it generalises correctly.

It is simple to modify the code to accept more input dimensions and use more hidden neurons, by changing the variables _inputDims and _hiddenDims. If you try this, make sure to add extra columns to the Patterns.csv file. By doing this, this network can be used to solve very complex problems.

In future posts, I will be using networks like this to solve real world problems - I would welcome suggestions.

John


Hidden Neurons and Feature Space

In this post, I continue on the journey to solve linearly non-separable problems using the Multi Layer Perceptron. I will be using the same problem that I introduced last week.

Linearly Non-Separable

The network I will be using to solve this problem is shown below. If you read my previous post, you will notice that I have introduced a bias to the network. Without a bias, any hyperplane must always pass through the origin, i.e. (0, 0), which is an unnecessary restriction.

Multi Layer Perceptron

The role of the two hidden neurons is divide input space into partitions. In order to achieve this, I have chosen two arbitrary hyperplanes and drawn them on the graph below.

Input Space

The values in the equations map directly to the weights in the network. However, first the equations must be rearranged with all terms on the same side of the equals sign, i.e. 0 = y + 0.36x - 0.85, etc. Therefore, the weights in the network are as follows:

xh1 = 0.36
yh1 = 1
bh1 = -0.85xh2 = 1.92
yh2 = 1
bh2 = -0.70

Using the sigmoid function introduced in my last post, this creates a feature space that looks like this, with h1 along the horizontal axis and h2 along the vertical axis.

Feature Space

As you can see, feature space is a distorted view of input space. Each of the points representing a class clearly fall within a distinct quadrant of the graph, which is evidence that the weights have done their job.

However, this feature space is still linearly non-separable, so the output neuron will still fail in its task of correctly classifying the points. To correct this, we need to modify the sigmoid function, and introduce another variable known as lambda (λ).

Sigmoid Function

  1. public double Function(double x)
  2. {
  3.     return 1 / (1 + Math.Exp(-lambda * x));
  4. }

Lambda has the effect of steepening the gradient of the sigmoid function, which in turn further distorts input space. By using a lambda value of six, feature space now looks like this.

Feature Space with Lambda

As you can see, this now results in a linearly separable pattern of points, which the output neuron can correctly classify. Again, the values in the equation map directly to the weights in the network. Therefore, the weights in the final layer of the network are as follows:

h1o = -0.73
h2o = 1
bo = 0.21

Therefore, we now have a network that can correctly classify the points of a linearly non-separable problem.

However, so far, we have done all the work ourselves. How do we get the network to establish correctly the required values of the weights automatically? In order to do that, we need a learning algorithm. In my next post, I will introduce the Back Propagation algorithm and provide a working example written in C#.

In the meantime, if you have any questions on this post, drop me a comment.

John


The Sigmoid Function C#

In my last post, I discussed the Single Layer Perceptron and its limitations regarding solving linearly non-separable problems. If we are to solve this type of problem, then we need a more sophisticated network, and that means adding layers. However, if we do that, how do we train the network?

Until now, we have used the error observed at the output neuron to adjust the weights between it and the input layer. But, what do we do if we add another layer in between, a hidden layer? Well, we still need to feed this error back through the network in order to adjust the weights, and this is a problem.

Multi Layer Perceptron

Currently, a neuron is either “on” or “off”, which is defined by the threshold activation function we are using.

Threshold Function

  1. public int Function(double x)
  2. {
  3.     return (x >= 0) ? 1 : -1;
  4. }

When graphed, it looks like this, with the blue line representing the function and the red line its derivative, which at zero is infinity (not very useful).

Threshold Graph

Any adjustment we make to the weights feeding the hidden layer is going to be rather course, so we need to smooth this out. We can do that by replacing our simple threshold function with something better. How about this? This is known as a sigmoid function.

Sigmoid Function

  1. public double Sigmoid(double x)
  2. {
  3.     return 2 / (1 + Math.Exp(-2 * x)) - 1;
  4. }

And its derivative.

f'(x) = 1 - f(x)2

  1. public double Derivative(double x)
  2. {
  3.     double s = Sigmoid(x);
  4.     return 1 - (Math.Pow(s, 2));
  5. }

When graphed, it looks like this.

Sigmoid Graph

Now, this looks much more useful. There is a smooth and continuous transition between -1 and 1, and the derivative goes nowhere near infinity. Whilst this function is very popular with many neural net developers, I would like to make one further refinement. I like my neurons to be working in the range of 0 to 1, and to do that I need to modify the function slightly.

Sigmoid Function

  1. public double Sigmoid(double x)
  2. {
  3.     return 1 / (1 + Math.Exp(-x));
  4. }

And its derivative.

f'(x) = f(x)(1 - f(x))

  1. public double Derivative(double x)
  2. {
  3.     double s = Sigmoid(x);
  4.     return s * (1 - s);
  5. }

When graphed, it looks like this.

Sigmoid Graph

Now, that is perfect, and it has the added benefit of being computationally less demanding, which will be important for larger networks.

In my next post, I will show you how to use the sigmoid activation function to build a Multi Layer Perceptron, trained with the Back Propagation Algorithm.

I welcome comments on this post, and suggestions for future posts.

John


The Single Layer Perceptron C#

This is the first in a series of posts in which I will write about the evolution of neural networks. I will start with the most simple network of all, the Single Layer Perceptron, and work through different architectures and learning methods which will eventually lead to networks that can predict stock market price movements, perform face recognition, handle natural language processing and much more.

I will provide full code samples, written in C#, which can be compiled and run by downloading Visual Studio.

The Perceptron is the grand daddy of all neural networks. It was invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. The Perceptron is the simplest type of feed forward neural network, known as a linear classifier. This means that the type of problems the network can solve must be linearly separable. In the graphs shown below it is easy to visualise what this means in two dimensions.

linearly separable linearly non-separable
This is linearly separable. This is linearly non-separable.

The green line represents the separation between the two classes of data the network is attempting to classify. In three dimensions this would be represented by a plane, and in four dimensions or more by a hyperplane. There are other types of network that can solve linearly non-separable problems, which I will cover in later posts.

To begin to solve this problem we need a network as represented below.

single layer perceptron

As you can see, each input node is connected directly to the output node. By adjusting the value of the connecting weights, the network is able to learn.

In the demonstration below each of the points plotted in the first graph above will be used as training data. The training data will be repeatedly applied to the input nodes of the network and the weights adjusted until the network is performs as desired.

Once the network has been trained the application will step through the range of input data to prove that the network has learnt correctly and is able to perform good generalisation. The following code shows how this is done.

 

  1. using System;
  2. namespace Perceptron
  3. {
  4.     public class Perceptron
  5.     {
  6.         [STAThread]
  7.         static void Main()
  8.         {
  9.             // Load sample input patterns.
  10.             double[,] inputs = new double[,] {
  11.                 { 0.72, 0.82 }, { 0.91, -0.69 }, { 0.46, 0.80 },
  12.                 { 0.03, 0.93 }, { 0.12, 0.25 }, { 0.96, 0.47 },
  13.                 { 0.79, -0.75 }, { 0.46, 0.98 }, { 0.66, 0.24 },
  14.                 { 0.72, -0.15 }, { 0.35, 0.01 }, { -0.16, 0.84 },
  15.                 { -0.04, 0.68 }, { -0.11, 0.10 }, { 0.31, -0.96 },
  16.                 { 0.00, -0.26 }, { -0.43, -0.65 }, { 0.57, -0.97 },
  17.                 { -0.47, -0.03 }, { -0.72, -0.64 }, { -0.57, 0.15 },
  18.                 { -0.25, -0.43 }, { 0.47, -0.88 }, { -0.12, -0.90 },
  19.                 { -0.58, 0.62 }, { -0.48, 0.05 }, { -0.79, -0.92 },
  20.                 { -0.42, -0.09 }, { -0.76, 0.65 }, { -0.77, -0.76 } };
  21.             // Load sample output patterns.
  22.             int[] outputs = new int[] {
  23.                 -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
  24.                 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };
  25.             int patternCount = inputs.GetUpperBound(0) + 1;
  26.             // Randomise weights.
  27.             Random r = new Random();
  28.             double[] weights = { r.NextDouble(), r.NextDouble() };
  29.             // Set learning rate.
  30.             double learningRate = 0.1;
  31.             int iteration = 0;
  32.             double globalError;
  33.             do
  34.             {
  35.                 globalError = 0;
  36.                 for (int p = 0; p < patternCount; p++)
  37.                 {
  38.                     // Calculate output.
  39.                     int output = Output(weights, inputs[p, 0], inputs[p, 1]);
  40.                     // Calculate error.
  41.                     double localError = outputs[p] - output;
  42.                     if (localError != 0)
  43.                     {
  44.                         // Update weights.
  45.                         for (int i = 0; i < 2; i++)
  46.                         {
  47.                             weights[i] += learningRate * localError * inputs[p, i];
  48.                         }
  49.                     }
  50.                     // Convert error to absolute value.
  51.                     globalError += Math.Abs(localError);
  52.                 }
  53.                 Console.WriteLine("Iteration {0}tError {1}", iteration, globalError);
  54.                 iteration++;
  55.             } while (globalError != 0);
  56.             // Display network generalisation.
  57.             Console.WriteLine();
  58.             Console.WriteLine("X, Y, Output");
  59.             for (double x = -1; x <= 1; x += .5)
  60.             {
  61.                 for (double y = -1; y <= 1; y += .5)
  62.                 {
  63.                     // Calculate output.
  64.                     int output = Output(weights, x, y);
  65.                     Console.WriteLine("{0}, {1}, {2}", x, y, (output == 1) ? "Blue" : "Red");
  66.                 }
  67.             }
  68.             Console.ReadKey();
  69.         }
  70.         private static int Output(double[] weights, double x, double y)
  71.         {
  72.             double sum = x * weights[0] + y * weights[1];
  73.             return (sum >= 0) ? 1 : -1;
  74.         }
  75.     }
  76. }

The output from the network after training looks like this, which proves generalisation has been correctly achieved. Each pair of x and y coordinates when presented to the network returns the classification expected, i.e. blue or red.

 

X, Y, Output
-1.0, -1.0, Blue
-1.0, -0.5, Blue
-1.0, 0.0, Blue
-1.0, 0.5, Blue
-1.0, 1.0, Blue
-0.5, -1.0, Blue
-0.5, -0.5, Blue
-0.5, 0.0, Blue
-0.5, 0.5, Blue
-0.5, 1.0, Red
0.0, -1.0, Blue
0.0, -0.5, Blue
0.0, 0.0, Blue
0.0, 0.5, Red
0.0, 1.0, Red
0.5, -1.0, Blue
0.5, -0.5, Red
0.5, 0.0, Red
0.5, 0.5, Red
0.5, 1.0, Red
1.0, -1.0, Red
1.0, -0.5, Red
1.0, 0.0, Red
1.0, 0.5, Red
1.0, 1.0, Red

Whilst this may seem like a trivial problem to solve in two dimensions, the Perceptron can perform very powerful analysis in higher dimensions. The same general principals shown here will apply.

In my next post, I will start to write about the multi-layer perceptron, which can be used to solve linearly non-separable problems, like the one presented in the second graph above.

I welcome feedback on this post, and would be interested in receiving suggestions for future posts.