Solved: Synchronize Audiobook and Ebook / problem: define paragraphs

djxudkkqnwd · June 4, 2024, 12:51am

Hello,

thank you so much for answering!

I think a simple way to do that is to give the user the possibility to define a “paragraph defining string” or whatever you want to call it.

I have created a long python script that uses Whisper from OpenAI to add timestamps to the respective ebook at the sentence level. So it combines Audio (Audible) and Text (Kindle).

It has an about 99% accuracy to find the exact timestamp for every sentence in the book. So you have basically the whole ebook and audiobook synchronized.

I will upload the scripts soon or maybe even create a website for that program so other users can use this too.

The results are multiple SRT files for that ebook sliced into the length accepted by LingQ.

After importing to LingQ:

I have added another functionality to this program:

It automatically adds subtitles that simply consist of the string “-----” to signify whenever there are linebreaks in the ebook.

The lesson in editing mode looks like this:

Some of the segments are merely this “paragraph defining string”.

If would be great if LingQ would give the user the ability to create new paragraphs by simply defining a string that should signify the beginning of a new paragraph.

The process simply

gets this string defined by the user. (In my case “-----”)
Searches for the segments that contain nothing but this string.
Adds space before that and after that segment and removes the space from all the other segments (in the german LingQ it’s called “Absatzabstand” I don’t know what it is called in the english LingQ so I will call it space)

image3423×1358 252 KB

Here two examples of the printed lesson:

1: before the desired process

and in the LingQ reader

2: after the desired process (I, for this demonstration, did it manually, which in the future of course is not worth the effort given how simple the task is)

and in the LingQ reader

and

You can see it is way more pleasant to read the processed version compared to the unprocessed version + you have the timestamps from the original audiobook.

Given that this process would be fairly simple to add to the website I hope I have convinced the LingQ team to give this a try. I am also more than willing to provide the program that i created for the synchronization since I have seen that there also is a “audio transcription” option

in LingQ currently which doesn’t use the respective ebook to correct mistakes made by the AI and a “generate timestamps” option

which seems to not work at least in my case.

zoran · June 4, 2024, 9:30pm

Thanks, we’ll check this.