Saturday, 16 May 2009

Linear Regression in C#

When looking at time series data, such as a stream of prices, it can often be useful to establish a general trend and represent this with a single number. This can be achieved using a linear regression calculation.

Take this series of prices:
4.8, 4.8, 4.5, 3.9, 4.4, 3.6, 3.6, 2.9, 3.5, 3.0, 2.5, 2.2, 2.6, 2.1, 2.2

If you plot on an Excel graph and add a linear trend line, you should get something like this:



We can do the same thing in code:

  1. using System;
  2.  
  3. class Regression
  4. {
  5.     static void Main(string[] args)
  6.     {
  7.         double[] values = { 4.8, 4.8, 4.5, 3.9, 4.4, 3.6, 3.6, 2.9, 3.5, 3.0, 2.5, 2.2, 2.6, 2.1, 2.2 };
  8.  
  9.         double xAvg = 0;
  10.         double yAvg = 0;
  11.  
  12.         for (int x = 0; x < values.Length; x++)
  13.         {
  14.             xAvg += x;
  15.             yAvg += values[x];
  16.         }
  17.  
  18.         xAvg = xAvg / values.Length;
  19.         yAvg = yAvg / values.Length;
  20.  
  21.         double v1 = 0;
  22.         double v2 = 0;
  23.  
  24.         for (int x = 0; x < values.Length; x++)
  25.         {
  26.             v1 += (x - xAvg) * (values[x] - yAvg);
  27.             v2 += Math.Pow(x - xAvg, 2);
  28.         }
  29.  
  30.         double a = v1 / v2;
  31.         double b = yAvg - a * xAvg;
  32.  
  33.         Console.WriteLine("y = ax + b");
  34.         Console.WriteLine("a = {0}, the slope of the trend line.", Math.Round(a, 2));
  35.         Console.WriteLine("b = {0}, the intercept of the trend line.", Math.Round(b, 2));
  36.  
  37.         Console.ReadLine();
  38.     }
  39. }

Now you have the slope of the trend line, this can be used as an input for neural networks analysing time series data. I use something similar in NNATS…

For a complete explanation of linear regression see Wikipedia.

John