Sentence mode splits sentences in half after whisper auto gen

Apologies if this is a known thing, but when I have LingQ automatically generate subtitles for a podcast mp3, it works perfectly in all but one respect - it inserts line breaks in the text in random places (or what seem like random places) and this makes sentence mode show phrases and fragments of the sentence rather than the whole thing, which can be quite frustrating when trying to translate the whole sentence at once. Is there a way around this that doesn’t involve manually editing each lesson to recombine sentence fragments?

Thanks in advance!

Yes, I noticed the same thing.

I imported the “Reseña” for Un Mundo Feliz and it broke nearly every sentence for some reason. It was short and easily corrected manually. The rest of the book, I was able to paste in the text from the ebook itself.

But as a test, I imported via Whisper a smaller 10 minute chapter and it made every sentence a paragraph and split up a few sentences on a comma even though the transcription correctly identified the pause as a comma and even made the beginning of the next “paragraph” a lower case letter.

Strange. I think Whisper is doing its thing, but something is going wrong with importing Whisper’s output. I’m not sure they can get around making each sentence a paragraph, but they should be able to import two segments with a comma as a single sentence/paragraph.

What you might try is to “Regenerate” the lesson, quickly make the changes and see if that fixes it. You may need to Generate timestamps again, but test and see first because the underlying timestamps from the Whisper may still work after the adjustments.

Thanks Mycroft. Okay, so it isn’t just me then.

I’ll do some tests but it would be really great if this could be fixed in the code somehow.

Thanks for reporting, we will look into it and have it fixed.

Amazing, thanks Zoran.

There’s a great demonstration of this effect in the Spanish podcast that nsprung pointed out in his History post today:

The sentence fragments are all over the place, making reading in sentence mode very frustrating.

You can use ChatGPT to correct unnecessary line breaks if you want to avoid manual editing. I got that hint from @Constantine and it worked for me. :blush:

I have the same issue with bulgarian.