Just Text to speech seems like its largely solved on pretty much every compute platform. However I have found a huge gap going from independent words being transcribed, to formatted text ready for an editor, or further processing.
If you look at how authors dictate they works ( which they have done for millennia), just getting the words written down is only the first step, and its by far the easiest. I have been helping build a tool https://bookscribe.ai that not only does the transcription, but then can post process it to make it actually usable for longer form content.
If you look at how authors dictate they works ( which they have done for millennia), just getting the words written down is only the first step, and its by far the easiest. I have been helping build a tool https://bookscribe.ai that not only does the transcription, but then can post process it to make it actually usable for longer form content.