Microsoft turns spoken English into spoken Mandarin – in the same voice

Microsoft has today posted a video, and complementary explanation of language translation that goes far past what we thought was currently possible.

In the video, the speaker explains and demonstrates improvements made to the machine understanding of his English words, which are automatically transcribed as he speaks. He then demonstrates having those words translated directly into Mandarin – if it’s actually Cantonese I’ll punish myself – text.

This is when the fun begins. Microsoft, he says, has taken in oodles of data, and can thus have that translated Mandarin spoken. And the final kicker: he has fed the system an hour’s worth of his voice, and thus the software will speak in Mandarin, using his own tones.

It’s mindbending. What is the core technology that powers the tool? According to Rick Rashid, head of Microsoft Research, the man who gave the presentation [Bold: TNW]:

“Just over two years ago, researchers at Microsoft Research and the University of Toronto made another breakthrough. By using a technique called Deep Neural Networks, which is patterned after human brain behavior, researchers were able to train more discriminative and better speech recognizers than previous methods.

[…] We have been able to reduce the word error rate for speech by over 30% compared to previous methods. This means that rather than having one word in 4 or 5 incorrect, now the error rate is one word in 7 or 8.”

Share this:

Related

Leave a comment Cancel reply