Free High Quality AI Text To Speech?

So now that we’ve slayed the dragon with high quality free transcriptions with Whisper technology, I’m trying to find the best free AI TTS. We have made due with the LingQ/Google/Amazon TTS voices until now but I’m coming across more and more AI voices that sound so much more natural and would be much more pleasant for me to use. I’m struggling to find free AI TTS voices that I can upload text and then download to add to LingQ lessons. So far I’m finding only sites that offer 10,000 character limits per month (about 2000 words), or just a one time sample, or no ability to download a file of the TTS. I know ChatGPT premium offers high quality voices but I don’t think you have the ability to download an mp3 of those. Any suggestions I might not have found yet. The key being VERY HIGH limits per day/week or no limits at all.

4 Likes

Finnish - Selma

https://www.veed.io/api/v1/subtitles/synthesize/preview?text=Mobiiliajokortin%20testaus%20älypuhelimessa%20saattaisi%20&voice=fi-FI-SelmaNeural&rate=1&locale=fi-FI

English - Steffan

https://www.veed.io/api/v1/subtitles/synthesize/preview?text=Hello%20World&voice=en-US-SteffanNeural&rate=1&locale=en-US

image

Seems to be a large selection and free

2 Likes

@roosterburton
5000 character limit per file on that apparently and I think a max of 30 minutes for a free account from what I can see. Unless I’m missing something?

@StewartLikesLingQ

5000 character limit is standard because The request is sent in The URL and URL can only be 5k characters max.

That is a preview link, it should just keep working again and again. If IP gets blocked maybe change VPN?

The state of TTS in LingQ is regrettable, at least in the languages that I have studied here. For example, I don’t understand why LingQ insist on using what sounds like a speech synthesizer from the 90s as their TTS in Chinese.
Anyways, I’ve been an a fan of the Microsoft Azure voices for a couple of years now, tons of voice options including dialects and speaking styles - quite natural pronunciation, prosody not human like obviously.
You can even access many of the voices for free by using the “read aloud” function in the Microsoft Edge browser, or simply via Python (edge-tts · PyPI).
As for other services, many people consider https://elevenlabs.io/ to be the current leader in this space. But I haven’t really evaluated it. They do have a free tier to try.

If you’re looking for truly free and open source solutions, I fear those aren’t quite there yet. I remember trying GitHub - suno-ai/bark: 🔊 Text-Prompted Generative Audio Model when it was first released but not only were the hardware requirements uncomfortably high (12GB+ GPU etc) but the results were also totally unpredictable, sometimes surprisingly good, other times AM radio.
Recently, XTTS: Open Model Release Announcement / Blog / Coqui was released, sound pretty decent as well. Maybe check that as well. Demo: XTTS - a Hugging Face Space by coqui

1 Like

Ya i heard and tried out elevenlabs yesterday. It is great, but it has a small free-tier cap worth about 1 lingq lesson per month. Since I can’t download a file from Edge unless I record the playback manually everytime which isn’t sustainable as that is time consuming so that’s a no go. I find it odd that open source AI voices aren’t a thing yet considering TTS has existed for a long time and ChatGPT does what I assume is way more resource intensive things for its free tier. That XTTS link has really nice sound samples. Hopefully it’s more consistent than the other one you tried.

1 Like

Yeah, my favorite so far is using read aloud from Microsoft Edge browser. When I truly want something with a much better voice then I use that and record with Audacity.

Stewart, how are you doing it? I don’t find it terribly time consuming anymore. I open the lesson, go to 3 dot menu to “print lesson”. (All this on Microsoft Edge of course). Turn on Audacity and set to record. Play the lesson. End the recording. I guess the main annoyance of time is in having to wait as the recording happens…but you get a free listen =). Then you upload the file to lingq. Seems nearly as good timing wise as you’d do with some online service…except you can probably go do something while it creates a recording and come back to it later.

Ideally LingQ would just use the Azure voice directly, or some other voice system.

3 Likes

I mostly use this AI text to speech for my projects and it works really well.
It is natural sounding too if you are wondering about it.
Give it a shot and let me know :wink: