Brute Force Speech Recognition

Brough Turner suggests modeling speech recognition after Google’s machine translation service, which translates from one language to another by finding the closest match in a huge corpus of translated documents. Google’s translation corpus consists of billions of pages of UN documents translated into every known language. If we can find a similar Rosetta stone for speech, consisting of a huge amount of audio transcribed into text, the same techniques could be applied.

Brough doesn’t name any libraries but hints that he has some ideas. I have some, too; say legal depositions, or closed captioning on TV.

Share this: