Apologies if this is a known thing, but when I have LingQ automatically generate subtitles for a podcast mp3, it works perfectly in all but one respect - it inserts line breaks in the text in random places (or what seem like random places) and this makes sentence mode show phrases and fragments of the sentence rather than the whole thing, which can be quite frustrating when trying to translate the whole sentence at once. Is there a way around this that doesn’t involve manually editing each lesson to recombine sentence fragments?
I imported the “Reseña” for Un Mundo Feliz and it broke nearly every sentence for some reason. It was short and easily corrected manually. The rest of the book, I was able to paste in the text from the ebook itself.
But as a test, I imported via Whisper a smaller 10 minute chapter and it made every sentence a paragraph and split up a few sentences on a comma even though the transcription correctly identified the pause as a comma and even made the beginning of the next “paragraph” a lower case letter.
Strange. I think Whisper is doing its thing, but something is going wrong with importing Whisper’s output. I’m not sure they can get around making each sentence a paragraph, but they should be able to import two segments with a comma as a single sentence/paragraph.
What you might try is to “Regenerate” the lesson, quickly make the changes and see if that fixes it. You may need to Generate timestamps again, but test and see first because the underlying timestamps from the Whisper may still work after the adjustments.
Zoran, any update? The sentence breaks are still happening with every whisper text generation. It seems to cut the sentence after a set number of characters. So every sentence or partial sentence ends up being a paragraph.
You can use notepad++ to edit into full sentences.
If you go to “edit lesson”. “Regenerate Lesson” (will give you the text as a whole). Copy that into Notepad++.
using regular expression mode,
Search on ([^.!?])\r\n\r\n
replace with \1
Then “replace all”.
This should give you full sentences with an empty line between each.
Possibly if on unix or mac? If your line endings are different you might have to search for ([^.!?])\n\n
Copy the formatted text into the edit lesson window (if you left it as is after the “regenerate lesson” step.
Then click “save” or “view lesson” at the top (can’t remember the wording). It should “save” it. If it doesn’t bring you out of the edit section, click “View Lesson” again. Then click “Edit lesson” again and click the “Generate Timestamps” button. Yes, you could stay in the lesson, but I think sometimes it gets a little hosed up, so going back out and going back to edit lesson seems to work most times.
Regular expressions could work. Certainly. For a quick fix, I have pasted to ChatGPT:
The following text has line break issues. Please reformat with exactly one sentence per line:
text text text…
The OCR software that came with my book scanner is terrible with San-Serif fonts in Dutch. ChatGPT is pretty good at restoring that too.
Thanks for the tips. This could be world-class software if it “just worked,” but I just re-subsribed for another year, so I am getting value from it as it stands.
Now… …if only the green dot told me what sentence number I am on…