Speech to Text Performance Issues

julsoft · August 7, 2023, 1:47pm

Via the website, speech to text sometimes takes a very long time play, and at other times, it does not play at all. Also sometimes, after a lengthy delay, the system falls back to one of the other speech to text voices. Both of these issues make the experience of stepping through a new text very jarring

I am at the point where I have to try and copy paste every word/sentence into Naver Papago to hear the pronunciation.

Is this apparent resource utilization issue something that is being worked on?

Thanks

davideroccato · August 7, 2023, 1:56pm

I suppose that if the lessons are too long it might take more time to generate the audio. I don’t have new long lessons that I convert to TTS yet as my current books where uploaded with the previous method.

Sometimes the audio stops and get stuck, and what I do is that I edit the lesson, delete the audio and run the process again.

If you are talking about sentence by sentence I’m not sure as I don’t use it anymore.

roosterburton · August 7, 2023, 2:06pm

To find the answer lets look at how TTS is genned on the LINGQ reader.

Indeed there are probably a couple end points needed to be hit, first you need to connect to the LINGQ v2 API… and then they need to connect to the TTS API they are using… and after that they need to save it in their AWS bucket and serve it to you. From the output looks like MS voices in this case, but worth experimenting and see what options are available.

If all is scaling well the TTS should happen pretty fast, but stress on any of those points will slow it down. Better off going to the source if you want consistency.

julsoft · August 7, 2023, 2:20pm

Yeah its the sentence by sentence, word by word thing that is failing

julsoft · August 7, 2023, 2:26pm

As far as the TTS implementation @roosterburton, sometimes the turn around is very quick. In fact most of the time it is. Then at other times, even for the same word in a different point in the same sentence, TTS it takes a very long time. Maybe they (Lingq) are getting throttled by the third party API, or maybe their requests are timing out and being resent or something. But it seems like they (Lingq) should at least cache the API response side when we hit the same word again.

However they fix it, it would be much easier to user if it was consistent.

roosterburton · August 7, 2023, 2:46pm

@julsoft I think you’re right about how TTS is handled. The TTS responses should be saved and reused. It would be easy to just point to a TTS generation if someone has created it at any point in time. This would reduce cost of using the API at a small storage increase. (Granting the current TTS saves are ephemeral) and reduce the recurrence of the nuisances you face here.
@nsprung
This is worth a look