Documenting process of using (youtube) songs as material - (Polish) Sanah - Nic Dwa Razy


I found this song by Sanah quite haunting and wanted to use it as listening material by importing it in LingQ. I wanted to document how I did this in case it is helpful for others.

I have shared the lesson I created and it can be found here along with a link to the original poem, in the lesson description, from the award winning Polish poet who wrote it.

So the process of how I eventually managed to get the song lyrics and time stamps into LingQ.

Unfortunately there are no subtitles on the original video and so I couldn’t somehow grab existing subs from the video.

First I tried grabbing the audio of the song using the command line tool yt-dlp. And then I uploaded it to lingq. Unfortunately whisper AI built into lingq didn’t do a good job on transcribing the audio. Not sure why but the time stamps were off and particularly towards the end of the video the text of the song itself had been garbled.

I tried using the song lyrics as included in the youtube description as the text for the lingq lesson and tried to see if lingq could automatically just generate the time stamps. Didn’t work either. I now see that the lyrics in the description didn’t repeat the choruses as in the video. Perhaps if I had manually copied and pasted the choruses to repeat them in the appropriate places then linq timestamp generation would have worked.

Instead what finally worked very well is I used the whisper implementation on Google colab here. I just gave it the youtube url and told set the language to pl or Polish and it successfully generated the subs with good time stamps.

I checked this against the original lyrics and all the punctuation had gone and so I used a diff tool to compare the subs to the original lyrics and put the punctuation back in. I also noticed that there was one or two words that were wrongly transcribed and 3 or 4 words omitted altogether. So I put these back in.

Then created a linqq lesson with the subtitle file I got from that process and linked to the original youtube video in the lesson. And voila. Finally something that will be great to learn from.