I tried a specific lesson with only audio for which I had already created a transcript through a website with whisper (large model). So first I created the lesson by automatic transcription in Lingq. Subsequently, I took the text file with the transcription from outside. The result for me was very different. The result of the internal transcript is too much a conflation of the import tool and the transcript mechanism without an opportunity to intervene in the result. I did this test about 3 weeks ago, so the tool may have been adapted, although I doubt I helps with the issues I quote here.
This is what I found:
1, The sentences in Lingq are really short and mostly broken off in unnatural places, due to interpretation of what is a sentence.
2. IIRC each sentence was also a paragraph, making the pages unnecessarily “spacious” in terms of empty lines. One wonders how a transcription-import tool determines paragraphs, if not helped by a human.
3. Also, some of the transcriptions were not as good as the external source.
4. With the external tool, I get the chance to download both .srt and .txt in a file. The .srt file is time-stamped, the .txt file is full sentences that are not time-stamped. This offers the opportunity to edit before importing. That way the result looks like I want it to look. I usually don’t have to change input, before importing, but I do change input in sentence mode. The tool just doesn’t interpret everything right.
Both sites use whisper, so I suspect that something else is disturbing the result. Either the versions of whisper or the models are different or the way Lingq handles input is getting in the way. I guess it might well be both. So for now, I choose the external source, even if there is an internal transcriptor.
As an aside, I edit the transcript as follows. First I determine where I want paragraphs, then I concatenate all the sentences that should go into one paragraph into a oneliner. This is due to the way Lingq’s importer works. With podcasts I often do not know the split of subjects, so I normally have only two paragraphs: the intro, the contents. The lesson after import looks fantastic because all lines are straight after one another. This gives a full page and a good overview of sentences.
The disadvantage is I have to make time-stamps, which is where Lingq interferes with the proud announcement “successfully generated time-stamps!”. Irrespective of whether I want it to or not. That might be an argument to use the .srt file, which unfortunately breaks up sentences in illogical places. So, mostly I endure Lingq’s stubborn overwriting and hope it does not recur.
The site I do the transcription from is “freesubtitles.ai” and I use the paid version, because that one guarantees to use the large whisper model. I use small amounts (€ 10) for which I get about 10 hours of transcription. Reason is that the owner sometimes just disappears and does not notice what is going wrong on the site. Until now, he has returned, hence the small amounts.
if you want to try it out, using the free service, beware:
- you most likely will get into a waiting queue
- you might be assigned the small model (depends on how busy the site is)
- Leave the page, loose your transcriptions or the transcription process is ended. You have to download before closing the tab.
Using the paid service, you get an email with a link that points to your personal page where you can see your history. Also, no waiting queues, and if you indicate that it should continue when closing the tab, it will. And of course the whisper (big model) always.
Remember to bookmark including the identifier (i.e. you can’t use ctrl-d).
Well that’s it. I hope it helps someone, or maybe you just want to experiment.