Youtube import with ai transcript not working?

shipshadow · February 9, 2025, 5:42am

ai transcript looks like not working. i have imported some videos, but all of it has imported with youtube subtitles. i found this 2 days before, but it looks still persist.
condition: japanese native, target lang english. Mac safari extension, ipad app. both of them show same problem. does anyone find same problem?

zoran · February 9, 2025, 3:44pm

Yes, if you are importing YT video, our importer grabs and import the subtitles file that is available. That is how it always worked.
If you prefer AI transcription, you can extract audio from the video, then upload the audio and generate AI transcription.

shipshadow · February 10, 2025, 1:58am

sorry my explanation was bad.
several days ago, i imported youtube video. and today i imported same video. i put screenshot these. what is the difference between these transcript?
not only this video, but almost all.

zoran · February 10, 2025, 7:06pm

I assume one lesson got regular English subtitles while the other got auto-generated subtitles. I’ll look into it.

DaisyGwynne · February 10, 2025, 8:14pm

I get this too, occasionally, just deleting the lesson and re-importing seems to fix it.

shipshadow · February 11, 2025, 1:59am

yes, understandable. actually i have experienced same situation im my past. i have noticed sometime getting different subtitles even if i would import same videos.
but this time, it looks completely stopped. i have tried over 20 times including other video, and all lesson are generated with simple subtitles.
infact, i have noticed same tendency has occur im my past several times. and everytime it had gotten better during several days without any noticing.
and today, same situation is coming at least my environment, so i wanna ask you about your situation that you have hitted same tendency or not.

roosterburton · February 11, 2025, 2:54am

I have noticed this too and have a couple thoughts on it.

YouTube AutoGen imports prior to 4th February were reformatted with AI to modify the sentences and include punctuation. The early implementations of this had significant problems with the timestamps and their were multiple complaints about how the words were modified in Chinese/Japanese using this method.

For users who were just reading the script the AI sentence construction was undoubtedly extremely helpful. For the majority of users who wanted to use sentence mode, listening mode or use any of the word tracking features they were well… shit out of luck. The implementation was almost good, its a shame they seem to have completely reverted the feature instead of offering a toggle and disabling it for problematic languages.

The best way to work around this is to just import the YouTube audio directly as a new lesson, without using the LingQ import extension/app to do it at all.

shipshadow · February 11, 2025, 12:55pm

thanks for your great explanation. i Appreciate you.

huurohakusan · February 12, 2025, 3:34pm

I also imported a youtube with English subtitles, but it is audio based transcription instead of the original English subtitles. I am very troubled. There are a lot of bullshit words being mass-produced.

glossboss · February 13, 2025, 11:32am

I’ve just imported a couple of Greek Youtube videos using the Lingq extension (both about 35 minutes long) and the resulting lessons simply contain the Youtube auto-generated subtitles without the AI-generated punctuation or sentence segmentation. Are there specific conditions or circumstances in which the AI processing doesn’t happen, or is this just a temporary glitch, or a withdrawal of the feature? It would be helpful to know what’s going on here.

shipshadow · February 13, 2025, 11:49am

same here. i learn english mainly from youtube, it is really a shame we cannot import video with AI transcribe used to be.
actually, i had experienced this tendency 2 times during 1 year in my past. at those times, these were gotten better within several days.
but im sorry, i dont know what was happening in the background, and this time too.
im just hoping this will be fixed soon as used to be.
if somebody know or notice about this problem, i would like to hear your experiences.

ScottTyler · February 13, 2025, 10:20pm

That’s not true. I import a lot of YouTube videos, and this is the way it is:

If during playback on YouTube with subtitles turned on, words appear in nicely formed lines, all at once instead of one word at a time, then the provider of that video to YouTube has also provided subtitles. In this case, use the LingQ browser extension to import the video.

However, if when you play back the video on YouTube, you see one word appear on the screen at a time, then those are YouTube’s auto-generated subtitles. Don’t use the LingQ browser extension to import this video because the captions will be horribly clustered in small groups per line, with sentences starting in the middle of these word clusters. Absolutely horrible, and shame on YouTube for not doing a better job here. The caption lines that are exported by YouTube will be exactly in the same horrible clusters as they appear on YouTube (but they appear one word at a time during playback on YouTube).

Instead use a tool to download an audio file for the YouTube video, and use Import->Audio from a LingQ web page to import the lesson, and then when it’s done, on the Edit Lesson page, add the YouTube link for the video.

Notes to LingQ:

Please modify your LingQ browser extension to explicitly state which subtitle track it will be importing. For example “French (auto-generated)” as opposed to “French” (which has proper subtitles).
Also modify it so that when the subtitle track that will be imported is one of these “… (auto-generated)” ones, that you warn the user with a message like: “This video has YouTube’s auto-generated subtitles. You may want to consider a different method to import this lesson.”
Please add a “Retranscribe Audio” button to the “Edit Lesson” page. This button would be grayed out if the lesson does not have an audio file. When pressed it would ask “This action will permanently replace all lesson text. Are you sure?”

I’m suspecting that LingQ doesn’t offer anything that would involve it downloading the audio from YouTube for a reason. Otherwise it would be nice to have a checkbox to import the audio from YouTube for LingQ to transcribe. And also a way to retranscribe the lesson after it has already been imported but with bad subtitles (but this would require LingQ to fetch the audio from YouTube, which I am guessing is problematic).

Additional notes regarding Audio Import (which uses Whisper AI):

You may get better results if you import the audio as a Beginner level lesson. You can then change it to Intermediate or Advanced after it is transcribed. I need to test this more, but so far, with repeated testing, I get slightly better line splitting and fewer crazy long lines when I do this. Not specifying a level seems to result in the transcribing being treated as Intermediate 2. Choose a Beginner level. Beginner 1 and 2 appear to transcribe identically.
If the audio is too long, it will essentially time out and fail. I’ve had 50 minute long lessons with audio files around 35 MB fail, but was eventually able to do it partially by importing a smaller size audio file (96kbps instead of 128kbps .m4a). I haven’t needed to try this yet, but if I get stuck again, I will try a mono file instead of a stereo one, .mp3 if needed. The file size should be a lot smaller, and perhaps the processing time a lot less.
Also with importing of long audio that appears to time out: After many minutes of processing, LingQ seems to often report a failure to import the audio file, and offer something along the lines of “Click here to delete the lesson”. When you see this, wait instead. I have seen the lesson all of a sudden appearing as being ok, meaning no longer failed. It seems that whatever times out the importing of the lesson, and reports it as a failed import, doesn’t actually stop Whisper AI from continuing. And if you just wait longer, the lesson will eventually appear as ready.
On some videos with background noise (like crowds cheering) you will get wildly different transcriptions each time you try to do an audio import. I discovered this yesterday. (First import with a 96kbps file had no spoken words transcribed, just some descriptions of sounds! 2nd attempt importing with a 128kbps file was pretty good, but I regretted deleting 2 paragraphs with cheering that were transcribed as “Sounds of war”, so I deleted the lesson and tried again, but got different results that were nowhere close to as good, over and over again, with wildly different transcriptions, like as if an AI with a completely different personality, whose native language was different, and perhaps with different hearing problems transcribed it each time, before finally getting one that was decent again. Luckily this was a 1 minute video, so the imports were fairly quick.)

At some point, when I have more time, I may write up a better guide to importing from YouTube.

roosterburton · February 13, 2025, 11:17pm

Some really cool ideas in your post, thanks for sharing.

I hadn’t considered if the whisper settings changed depending on the difficulty level of the imported lesson.

LingQ doesn’t say which model or alignment technique they are using for the transcription. I’ve tried basically every provider on the market and it seems like some models work better than others for different content.

Specifying the language as auto detect (instead of manual set) can cause wildly different results with providers that allow it. Very few providers handle multiple languages in a single audio well, Gladia was very promising but has a substantial price tag compared to a local or standard payg provider.

It is true, actually.

All imported were reformatted with AI to correct the grammar and move words between sentences. The feature seems to be turned off since 4th February. It forced me to completely override changes done after lesson creation because of the poor timestamp redistribution.

Second this,

Have noticed this several times too when polling the API, the lesson comes back as failed import and then moments later polls as a ready lesson. I figured it was a random failsafe to break bots.

Feature was disabled last year for various reasons. This is their recommendation.

ScottTyler · February 14, 2025, 1:10am

Sorry for contradicting you. For Ukrainian, I am pretty certain that was not the case. Because in my case, long before February, I became so frustrated with importing lessons with the extension, hating the YouTube auto-generated word clusters for each paragraph, deleting the lesson, and then importing again as Audio so that I can get the Whisper AI paragraphs, that I realized that I have to pay attention before trying to import, and first check to see if the captions are auto-generated garbage, or if they are proper content-creator supplied subtitles. This was way, way, before February 4th. And it has nothing to do with caption timestamps, just the really poor clusters for each paragraph.

I did however have problems with the timestamps generated by Audio Import. I had to do a lot of playback testing and hand correcting of start and end times. And I still do, but it seems to be better now. Maybe starting on Feb 4th? It got better when I started importing everything as Beginner 1 instead of mostly Intermediate 2, so I assumed it may have been that. But maybe that was around Feb 4th, so perhaps it happened with that change? I still get bad timestamps, it just doesn’t seem as bad. And I can’t wait until they fix that completely. It’s way too much work fixing all those timestamps.

And in regards to that, I communicated to them how they can make simple changes to make sentence splitting and start and end timestamp editing a lot faster, easier, and more efficient. I need to make a post here on that too. Here is a teaser:

If you press either the “+” or the “+ Add paragraph spacing” button at that point in the above image, then the you would automatically get 2 sentences/paragraphs with the text split as shown, and the start and end times automatically inserted using the paused playback position as the end time for the first sentence, and as the start time of the second sentence. (The “Add description” text box near the top of the page already grows if you press the Enter key on your keyboard. And so does the “Add paragraph” text entry area.)

The graphic waveform is just fantasy that I would love to see, but I will gladly accept it without that. But if we did have that though, it would make editing so much easier.

The “-” and “+” buttons now do 0.1s adjustments. To get 1s adjustments, press “-1” or “+1” instead.

“Replace” copies the current paused playback timestamp into either Start Time or End Time.

Clicking anywhere on the playback timeline instantly moves to that position. Currently, unlike in lesson playback, here it does not do that.

Those green buttons would only be green when you have a perfect setup for splitting paragraphs: 2 lines in paragraph entry area, 2 lines in translation area, and playback paused not at the start or end, but in between.

More fantasy:

Pressing “-” or “+” not only adjusts the timestamp by 0.1s, but also plays back 0.1s of audio so that you can hear if you are at the right spot or not.

Pressing the words “Start Time” or “End Time” plays back 1s of audio at start or end so that you can evaluate if you are at the right spot.

LarryZTurtle · February 14, 2025, 12:46pm

After the changes to YouTube imports on Feb 4, I can no longer create a multi-word LingQ if there is a line break in the middle. It just doesn’t give me the option. There is no problem creating a multi-word LingQ if it is all on the same line. I have e-mailed support, and am awaiting their reply.

zoran · February 14, 2025, 10:15pm

@LarryZTurtle I replied to your email.

glossboss · February 15, 2025, 10:46pm

I just noticed @roosterburton 's earlier comment about how the AI-generated reformatting of Youtube auto-subtitles was ‘prior to 4th February’, which indicates that the feature has since been removed. That would explain its absence from my recent imports! It wasn’t perfect (occasional bad sentence splitting and frequent bad timestamps, probably because it didn’t try to recalculate the timestamps based on the sentence boundaries) but it was somewhat helpful. I will revert to uploading mp3 and getting Whisper-generated transcripts that way.

shipshadow · February 16, 2025, 6:40am

totally agree. We can still use the MP3 upload function one by one, but I could learn more languages during this time if the importer function were fixed. I hope it will be fixed soon.