Improved Text-to-Speech?

Dominic_Olofsson-Tuisku · July 20, 2023, 9:34pm

Seeing the new Whisper function auto-generating transcripts for audio files has been revolutionary for enabling one to listen to more content of interest, not having to rely on already available transcriptions.

However, unfortunately as of now the text-to-speech is quite terrible on LingQ. It’s overly robotic and annoying to listen to, even worse it’s distracting. I’ve been hearing more and more about this new generation of text-to-speech that sounds more natural and authentic. I’m curious if maybe LingQ can work to integrate any of these? I even saw on PlayHT that there are multiple accents represented for different languages. This could be really useful for LingQ.

bstein · July 21, 2023, 10:14am

The reason you don’t see better text to speech is that higher quality text to speech processing with better quality and more accents is a more expensive service, whereas what LingQ is using is almost a zero cost.

I am quite happy with Narakeet instead of LingQ for my audio file.

bamboozled · July 21, 2023, 12:19pm

Which language are you referring to? LingQ uses different providers depending on language and platform, for example on iOS you get access to Apple’s built-in TTS, in addition to the ‘web voices’.
I believe you are studying Romanian and possibly Chinese (interesting combination). Both of these are served by AWS Polly (Text to Speech Software – Amazon Polly – Amazon Web Services). Those are absolutely atrocious and sound like a sound synthesizer from the 90s. Basically unusable, especially in Chinese, where you can barely hear the tones and tone sandhi are not represented at all.
Technically AWS offer a premium voice called ‘neural’ in Chinese, but that is about 4x as expensive and therefore not used by LingQ.

I personally use the Microsoft Edge browser to get access to Microsoft Azure’s high quality TTS services (https://azure.microsoft.com/en-us/products/ai-services/text-to-speech). They offer a large selection of voices, including dialects (Taiwan, Mainland), male and female voices.
In ‘normal’ languages, you can select the ‘print page’ option under the three dots and have the TTS read the lesson in another tab.
In Chinese languages (and Japanese), however this will not work, because LingQ insert spaces between the characters. This will trip the system up and cause it to stutter. You would have to copy it and remove the spaces to make this work.

If you are looking for a more technical solution, I can recommend this handy Python package: edge-tts · PyPI This should allow you to create an MP3 file you can upload alongside your lesson on LingQ.
The example in the repo works fine, just remember to remove spaces from LingQ content before using it with Chinese.
Here is an example snippet you can use to process text files:

import asyncio
import edge_tts

input_file = "/path/to/input.txt"
output_file = "/path/to/output.mp3"
voice = "zh-CN-YunxiNeural"

async def amain() -> None:
    with open(input_file, "r", encoding="utf-8") as file:
        text = file.read()

    # Remove whitespaces from the text
    text = text.replace(" ", "")

    communicate = edge_tts.Communicate(text, voice)
    await communicate.save(output_file)

if __name__ == "__main__":
    loop = asyncio.get_event_loop_policy().get_event_loop()
    try:
        loop.run_until_complete(amain())
    finally:
        loop.close()

Of course you can also automatically download lessons, create TTS and then upload the audio, without ever visiting this website. Make sure to use PATCH when adding audio to an existing lesson.

jt23 · July 21, 2023, 9:45pm

Interesting comments.

I’m learning French (six months) and I find I usually prefer the TTS. The podcasts often include extraneous noise, as well as being more varied than I like for my level. Plus LingQ can chop up the audio sentences unpleasantly.

The French TTS is OK. I know it’s not human and it is somewhat robotic. I question some of its liaison choices. But overall for my level of learning how to hear and pronounce French it seems more than adequate.

Am I missing something?

Dominic_Olofsson-Tuisku · July 21, 2023, 9:50pm

There’s nothing, at least with French, prohibitive with the TTS. I use it for basically every word in French I wanna hear. However if you take TTS for just a singular word in Romanian or Chinese it literally leaves you clue-less or with at best an approximation.

However, to answer that question of yours. It’s adequate yes, but seeing how big of a game changer the auto-transcription has been just leaves one wishing for an equally good counterpart for let’s say when wishing to listen to a book you’re reading. (in my experience it’s impossible to even buy an Audiobook version of a French book to import to LingQ)