So now that we’ve slayed the dragon with high quality free transcriptions with Whisper technology, I’m trying to find the best free AI TTS. We have made due with the LingQ/Google/Amazon TTS voices until now but I’m coming across more and more AI voices that sound so much more natural and would be much more pleasant for me to use. I’m struggling to find free AI TTS voices that I can upload text and then download to add to LingQ lessons. So far I’m finding only sites that offer 10,000 character limits per month (about 2000 words), or just a one time sample, or no ability to download a file of the TTS. I know ChatGPT premium offers high quality voices but I don’t think you have the ability to download an mp3 of those. Any suggestions I might not have found yet. The key being VERY HIGH limits per day/week or no limits at all.
Finnish - Selma
English - Steffan
Seems to be a large selection and free
@roosterburton
5000 character limit per file on that apparently and I think a max of 30 minutes for a free account from what I can see. Unless I’m missing something?
5000 character limit is standard because The request is sent in The URL and URL can only be 5k characters max.
That is a preview link, it should just keep working again and again. If IP gets blocked maybe change VPN?
The state of TTS in LingQ is regrettable, at least in the languages that I have studied here. For example, I don’t understand why LingQ insist on using what sounds like a speech synthesizer from the 90s as their TTS in Chinese.
Anyways, I’ve been an a fan of the Microsoft Azure voices for a couple of years now, tons of voice options including dialects and speaking styles - quite natural pronunciation, prosody not human like obviously.
You can even access many of the voices for free by using the “read aloud” function in the Microsoft Edge browser, or simply via Python (edge-tts · PyPI).
As for other services, many people consider https://elevenlabs.io/ to be the current leader in this space. But I haven’t really evaluated it. They do have a free tier to try.
If you’re looking for truly free and open source solutions, I fear those aren’t quite there yet. I remember trying GitHub - suno-ai/bark: 🔊 Text-Prompted Generative Audio Model when it was first released but not only were the hardware requirements uncomfortably high (12GB+ GPU etc) but the results were also totally unpredictable, sometimes surprisingly good, other times AM radio.
Recently, XTTS: Open Model Release Announcement / Blog / Coqui was released, sound pretty decent as well. Maybe check that as well. Demo: XTTS - a Hugging Face Space by coqui
Ya i heard and tried out elevenlabs yesterday. It is great, but it has a small free-tier cap worth about 1 lingq lesson per month. Since I can’t download a file from Edge unless I record the playback manually everytime which isn’t sustainable as that is time consuming so that’s a no go. I find it odd that open source AI voices aren’t a thing yet considering TTS has existed for a long time and ChatGPT does what I assume is way more resource intensive things for its free tier. That XTTS link has really nice sound samples. Hopefully it’s more consistent than the other one you tried.
Yeah, my favorite so far is using read aloud from Microsoft Edge browser. When I truly want something with a much better voice then I use that and record with Audacity.
Stewart, how are you doing it? I don’t find it terribly time consuming anymore. I open the lesson, go to 3 dot menu to “print lesson”. (All this on Microsoft Edge of course). Turn on Audacity and set to record. Play the lesson. End the recording. I guess the main annoyance of time is in having to wait as the recording happens…but you get a free listen =). Then you upload the file to lingq. Seems nearly as good timing wise as you’d do with some online service…except you can probably go do something while it creates a recording and come back to it later.
Ideally LingQ would just use the Azure voice directly, or some other voice system.
I mostly use this AI text to speech for my projects and it works really well.
It is natural sounding too if you are wondering about it.
Give it a shot and let me know
I used GitHub - aedocw/epub2tts: Turn an epub or text file into an audiobook initially with edge-tts but moved to piper GitHub - rhasspy/piper: A fast, local neural text to speech system
I use this to whisper a youtube video, google translate it …get an mp3 and then with lingq api upload it as a lession in my target language.
Moved to piper due to speed of the readers. Edge voices sound like news anchors with correct diction and pacing. Piper voices sound more like actual speech, so you need to think fast to understand it.
Piper voices samples Piper Voice Samples
I also forgot to say…as both solutions you get infinite characters. I have tts-ed entire books with them without issues.
This is my current approach. I copy a Chinese text into an online notepad on Edge. Then I let Edge read it out loud and I record it with Audacity. I am not sure if there is an upper limit, but it is much better than Lingq’s TTS. Sadly I then often create a Lingq lesson by uploading the audio to Lingq. [If Lingq’s TTS was better, I would have saved a lot of time]
Abair.ie for Irish that is slowly coming to Lingq.
The information in my previous post is not up to date anymore. LingQ has upgraded the voices a long time ago already.
In Mandarin Chinese (simplified), the Zhiyu voice is now the ‘neural’ variant of the previously low quality one from AWS Polly.
Yunyang is from Microsoft and I believe one of the voices available through Edge. For some reason it stutters for me however in sentence mode or when generating lesson audio.
Unfortunately the two other voices are in a different language (Cantonese) altogether I don’t know why that is, not sure what use they are.
In Chinese traditional, there is, in addition to Zhiyu and Yunyang, HuiHui, which is by Microsoft as well, but it exhibits the same stutter. All of these voices are of the Mainland variant and not Taiwanese.
In Cantonese, both available voices are by Microsoft and work fine.
If you want something truly unlimited then you could try one of the GitHub python projects. There are a lot of TTS projects using the free edge voices. Requires a bit of effort to learn how to do it.
I use Calibre to break down the epubs I have into individual chapters. Then I pass each chapter to this project:
I have setup a docker container for it.
I’m learning czech. And the edge voice Vlasta is perfect for the ebook I’m listening to. Very realistic natural voice. I’m happy with the results.
Then I upload the epub chapter to lingq. And upload the lesson audio to lingq web browser. And i automatically get that audio in the mobile app too. Just know that if you have already generated audio for any epubs then it is difficult to get this new audio. I’ve had to delete entire lessons with old audio then reload the new epub lesson. To get the new audio working.
I tried the French tts on Piper and it was quite good. Thanks!