Data in ML is critical, and this release from Mozilla is absolute gold for voice research.
This dataset and will help the many independent deep learning practitioners such as myself that aren't working at FAANG and have only had access to datasets such as LJS [1] or self-constructed datasets that have been cobbled together and manually transcribed.
Despite the limited materials available, there's already some truly amazing stuff being created. We've seen a lot of visually creative work being produced in the past few years, but the artistic community is only getting started with voice and sound.
Another really cool thing popping up are TTS systems trained from non-English speakers reading English corpuses. I've heard Angela Merkel reciting copypastas, and it's quite amazing.
I've personally been dabbling in TTS as one of my "pandemic side projects" and found it to be quite fun and rewarding:
Besides TTS, one of the areas I think this data set will really help with is the domain of Voice Conversion (VC). It'll be awesome to join Discord or TeamSpeak and talk in the voice of Gollum or Rick Sanchez. The VC field needs more data to perfect non-aligned training (where source and target speakers aren't reciting the same training text that is temporally aligned), and this will be extremely helpful.
I think the future possibilities for ML techniques in art and media are nearly limitless. It's truly an exciting frontier to watch rapidly evolve and to participate in.
Curious to know why don't researchers use Audiobooks/Videos and transcript when data is not available? Is it because these do not capture different dialects/accents?
This dataset and will help the many independent deep learning practitioners such as myself that aren't working at FAANG and have only had access to datasets such as LJS [1] or self-constructed datasets that have been cobbled together and manually transcribed.
Despite the limited materials available, there's already some truly amazing stuff being created. We've seen a lot of visually creative work being produced in the past few years, but the artistic community is only getting started with voice and sound.
https://www.youtube.com/watch?v=3qR8I5zlMHs
https://www.youtube.com/watch?v=L69gMxdvpUM
Another really cool thing popping up are TTS systems trained from non-English speakers reading English corpuses. I've heard Angela Merkel reciting copypastas, and it's quite amazing.
I've personally been dabbling in TTS as one of my "pandemic side projects" and found it to be quite fun and rewarding:
https://trumped.com
https://vo.codes
Besides TTS, one of the areas I think this data set will really help with is the domain of Voice Conversion (VC). It'll be awesome to join Discord or TeamSpeak and talk in the voice of Gollum or Rick Sanchez. The VC field needs more data to perfect non-aligned training (where source and target speakers aren't reciting the same training text that is temporally aligned), and this will be extremely helpful.
I think the future possibilities for ML techniques in art and media are nearly limitless. It's truly an exciting frontier to watch rapidly evolve and to participate in.
[1] https://keithito.com/LJ-Speech-Dataset/