Data in ML is critical, and this release from Mozilla is absolute gold for voice...

indogooner · on July 1, 2020

Curious to know why don't researchers use Audiobooks/Videos and transcript when data is not available? Is it because these do not capture different dialects/accents?

bluGill · on July 1, 2020

Those tend to be read in artificial voices. They are useful for some things, but it isn't real speach so it isn't as generally useful.

petargyurov · on July 1, 2020

I suppose there might some copyright issues with such content (?)

mjepronk · on July 1, 2020

Well there is Librivox...

est31 · on July 1, 2020

Indeed, that's what one of the famous established datasets, LibriSpeech, bases on.

cptwunderlich · on July 1, 2020

Oh man, I'm really interested in TTS (for rarer languages). Do you have any pointers or good resources to share?