Note: The supporting code for this blog post can be found on GitHub.

I must admit, I’m a fan of Aesop’s Fables. These are a collection of fables credited to Aesop, a slave and storyteller believed to have lived in ancient Greece.

They are a fun read and form the basis of many modern phrases and sayings. Unfortunately, Aesop won’t be writing more anytime soon…

I thought, wouldn’t it be fun to write a generative language model that could automatically create new fables in the style of Aesop?

Obviously, this is a tall order, but character level generators do exist – see this awesome post by Andrej Karpathy: The Unreasonable Effectiveness of Recurrent Neural Networks.

In fact, Andrej’s post inspired me to see what I could do using C#.

Andrej shows how Recurrent Neural Networks (RNNs) can be used to generate complex sequences with long-range structure, simply by predicting one step at a time. Long Short Term Memory (LSTM) cells are a special kind of RNN, capable of learning long-term dependencies and specifically designed to avoid the vanishing gradient problem. Scouring the internet, I could find virtually no attempts to build an LSTM using C# – challenge accepted!

I won’t go into depth about how LSTMs work, but if you want to learn more, check out this excellent article by Christopher Olah: Understanding LSTM Networks. Or, for the really technical stuff, check out Alex Graves homepage.

Alex states in his paper Generating Sequences With Recurrent Neural Networks: RNNs can be trained for sequence generation by processing real data sequences one step at a time and predicting what comes next. Assuming the predictions are probabilistic, novel sequences can be generated from a trained network by iteratively sampling from the network’s output distribution, then feeding in the sample as input at the next step. In other words by making the network treat its inventions as if they were real, much like a person dreaming.

Now that we have selected LSTMs as our tool of choice, let’s generate some new fables…!

Concatenating all of Aesop’s Fables I could find on the internet, I accumulated 185kb of plain text which would be used for training the network – absolutely tiny by machine learning standards, but big enough to prove the point.

Given my background, most of my subscribers have an affinity towards C#, so I thought it would be a good idea to write an LSTM text generator from scratch in C#, sharing the code on GitHub. There’s also a vanilla RNN bundled in too, if you want to compare the difference.

My network uses two LSTM layers, each with 256 memory cells, and a single SoftMax output layer – approx 867k parameters in total. I use a cross entropy loss function to calculate the gradients, which are back propagated through 24 time steps. I found that Root Mean Square Propagation (RMSProp) dramatically improves training times.

The code uses Parallel.For loops to make efficient use of all available CPU cores. If you are running on less than four cores, this will be painfully slow!

Whilst this is a fully functional example, I recommend its use is purely educational. If you want to build a deep production network, I strongly suggest using an optimised deep learning framework such as Torch or Caffe.

After each training iteration, the generator produces some sample text. Let’s see how the samples evolve as the model learns. At the end of the first iteration, it’s basically generating gibberish with the odd short word interspersed:

A “famen the by mous, Rhe I liog sI contoplepis al ane neon eo, betk, on uut the giind, mhemaed in lo dinghis fiting the oRod dedingof. ApocumFib ther is awthe Rhe mion him is mis bropn gA the Laoc the camok mand cimis po; baticg thile ip Cous ha cro et, in I thim and fous to is sifk ind bimke lothe ipit milre hat thed toouf ffifir ifiribwthel thed macoun. Qo curhey riff.

After 20 iterations, it’s beginning to look a lot more like English, but is still littered with spelling mistakes.

A Wolf quickly of time in his fight of abmain. The dother pleaget, and hake a smmeroutaridained that well said. He had heppled their procairing of a shepherd in her beeped his spig, and shuaking ann desfiess precespence of his resent for the Ast, but is upwain, the dreasuress an Kight a Streich repleding from this careing in abriectens, as the beast tome a happent.

After 50 iterations, the spelling mistakes have gone and it’s even learnt to open and close quotation marks – but it still doesn’t make a huge amount of sense.

A Wolf and his state, saw to her for me, but clothing and to sell, and asked forth those harder, bent in these lift imitation as to hear at mush. “After you most believed you?” I reached the stream but often interrupted! Why was easily were in their fate. Of the plain desire and not wish heat matter The Snake, therefore shards. His Mother in reply benefit.

So the moral of the story – you can generate some pretty interesting results simply sampling one character at a time. More data would have helped, but running this without using a GPU would have taken forever. As I said before, this was an academic exercise to explore what could be coded by hand. The recent Deep Learning revolution has been fuelled by huge data sets and GPUs.

Check out my code on GitHub, but if you really want to do anything serious, get Torch!

In summary, it’s been fun……