Youtube import with ai transcript not working?

ScottTyler · February 13, 2025, 10:20pm

That’s not true. I import a lot of YouTube videos, and this is the way it is:

If during playback on YouTube with subtitles turned on, words appear in nicely formed lines, all at once instead of one word at a time, then the provider of that video to YouTube has also provided subtitles. In this case, use the LingQ browser extension to import the video.

However, if when you play back the video on YouTube, you see one word appear on the screen at a time, then those are YouTube’s auto-generated subtitles. Don’t use the LingQ browser extension to import this video because the captions will be horribly clustered in small groups per line, with sentences starting in the middle of these word clusters. Absolutely horrible, and shame on YouTube for not doing a better job here. The caption lines that are exported by YouTube will be exactly in the same horrible clusters as they appear on YouTube (but they appear one word at a time during playback on YouTube).

Instead use a tool to download an audio file for the YouTube video, and use Import->Audio from a LingQ web page to import the lesson, and then when it’s done, on the Edit Lesson page, add the YouTube link for the video.

Notes to LingQ:

Please modify your LingQ browser extension to explicitly state which subtitle track it will be importing. For example “French (auto-generated)” as opposed to “French” (which has proper subtitles).
Also modify it so that when the subtitle track that will be imported is one of these “… (auto-generated)” ones, that you warn the user with a message like: “This video has YouTube’s auto-generated subtitles. You may want to consider a different method to import this lesson.”
Please add a “Retranscribe Audio” button to the “Edit Lesson” page. This button would be grayed out if the lesson does not have an audio file. When pressed it would ask “This action will permanently replace all lesson text. Are you sure?”

I’m suspecting that LingQ doesn’t offer anything that would involve it downloading the audio from YouTube for a reason. Otherwise it would be nice to have a checkbox to import the audio from YouTube for LingQ to transcribe. And also a way to retranscribe the lesson after it has already been imported but with bad subtitles (but this would require LingQ to fetch the audio from YouTube, which I am guessing is problematic).

Additional notes regarding Audio Import (which uses Whisper AI):

You may get better results if you import the audio as a Beginner level lesson. You can then change it to Intermediate or Advanced after it is transcribed. I need to test this more, but so far, with repeated testing, I get slightly better line splitting and fewer crazy long lines when I do this. Not specifying a level seems to result in the transcribing being treated as Intermediate 2. Choose a Beginner level. Beginner 1 and 2 appear to transcribe identically.
If the audio is too long, it will essentially time out and fail. I’ve had 50 minute long lessons with audio files around 35 MB fail, but was eventually able to do it partially by importing a smaller size audio file (96kbps instead of 128kbps .m4a). I haven’t needed to try this yet, but if I get stuck again, I will try a mono file instead of a stereo one, .mp3 if needed. The file size should be a lot smaller, and perhaps the processing time a lot less.
Also with importing of long audio that appears to time out: After many minutes of processing, LingQ seems to often report a failure to import the audio file, and offer something along the lines of “Click here to delete the lesson”. When you see this, wait instead. I have seen the lesson all of a sudden appearing as being ok, meaning no longer failed. It seems that whatever times out the importing of the lesson, and reports it as a failed import, doesn’t actually stop Whisper AI from continuing. And if you just wait longer, the lesson will eventually appear as ready.
On some videos with background noise (like crowds cheering) you will get wildly different transcriptions each time you try to do an audio import. I discovered this yesterday. (First import with a 96kbps file had no spoken words transcribed, just some descriptions of sounds! 2nd attempt importing with a 128kbps file was pretty good, but I regretted deleting 2 paragraphs with cheering that were transcribed as “Sounds of war”, so I deleted the lesson and tried again, but got different results that were nowhere close to as good, over and over again, with wildly different transcriptions, like as if an AI with a completely different personality, whose native language was different, and perhaps with different hearing problems transcribed it each time, before finally getting one that was decent again. Luckily this was a 1 minute video, so the imports were fairly quick.)

At some point, when I have more time, I may write up a better guide to importing from YouTube.