Baidu (the Chinese Google) has just released its WARP-CTC library on GitHub under an open-source Apache licence.

CTC is an objective function that can be used for the supervised training of sequence prediction. The original CTC approach was developed in 2006 in Swiss AI lab IDSIA by Alex Graves, Santiago Fernandez, Faustino Gomez and Jurgen Schmidhuber.

Baidu decided to develop something new because their engineers found that existing implementations of CTC generally require significant amounts of memory and were very slow. Warp-CTC can be used to solve supervised problems that map an input sequence to an output sequence, such as speech recognition.

sound waves

Warp-CTC was developed to improve the scalability of CTC models trained for Baidu’s Deep Speech system. The chief scientist at Baidu responsible for Warp-CTC is Andrew Ng, who is noted for his research on neural networks running on GPUs.

Ng has taught classes on machine learning, robotics, and other topics at Stanford University. He also co-founded the massively open online course start-up Coursera.

Specifically, Warp-CTC is an open source implementation of the CTC algorithm for CPUs and NVIDIA GPUs. Baidu are releasing Warp-CTC as a C library along with integration for Torch, a scientific computing framework.

Baidu have said they want to make end-to-end deep learning easier and faster so researchers can make more rapid progress. A lot of open source software for deep learning exists, but previous code for training end-to-end networks for sequences has been too slow.

The CTC approach builds on recurrent neural networks, an increasingly common approach used in deep learning solutions. Recurrent neural networks are powerful sequence learners that are well suited to applications such as speech recognition.

One of the major challenges Deep Speech faces is handling variable length audio snippets. In computer vision tasks, images may be rescaled to a fixed size without changing the overall content. But speech data can’t be normalised in the same way. The underlying data model needs to shrink or grow automatically. To do this, Deep Speech uses a recurrent neural network, where an audio sample is sliced into time steps of equal size and a neural network is applied to each time step. Thus, the network learns not only from the input of the current time step but also from the output of the previous time step.

Baidu’s approach to open-source is not new – Facebook, Google and Microsoft have also open-sourced their AI software. In November last year, Google released TensorFlow on GitHub, also under an open-source Apache license.

Baidu was founded in 2000 by Internet pioneer Robin Li, with the mission of providing the best way for people to find what they’re looking for online (a search engine??)  Over the next five years, financial analysts that follow this company are expecting it to grow earnings at an average annual rate in excess of 23%. Baidu currently operate almost exclusively in China, but clearly have aspirations – Google, watch out!