This is how to generate perfect transcripts for audio podcasts for free

noxialisrex · April 25, 2023, 1:06pm

This is why I would personally still opt for the actual book as a “transcript” if it is available.

A synchronized subtitle is not really intended to replace ebooks for R+L. It is just that, auto created subtitles.

BabyRuth · April 25, 2023, 2:00pm

can you pull the audio from youtube if it doesnt have subtitles? the extension just said no text found…

Mycroft · April 25, 2023, 2:49pm

Agreed, but you have to admit, it’s a godsend for podcasts in the target language!

noxialisrex · April 25, 2023, 8:57pm

Oh I don’t disagree, podcasts, radioplays, TV shows, movies, even YouTube videos, it’s great. I’d just argue something is being lost with audiobooks depending on the adaptation.

Atlan · April 25, 2023, 9:48pm

You can extract the audio using any youtube downloader, for example 4K youtube downloader and download the audio to lingQ. Then you name the file and generate the transcript. I tried with Chinese and didn’t like it much. Besides the many mistakes, it gave traditional characters. Does anybody know if is it possible to select the output for simplified hanzi?

bamboozled · April 26, 2023, 5:37am

Yeah, that’s a known problem. Whisper doesn’t differentiate between simplified traditional (or even Cantonese) it’s all lumped together under Chinese. The model will just choose randomly, although I have definitely observed a preference for traditional characters.
To workaround this one normally includes an ‘initial prompt’, like: --initial_prompt 以下是普通话的句子。
This will steer the model in the right direction, it it is also helpful when one wants to indicate that punctuation is desired.
Unfortunately LingQ doesn’t expose this option and I fear they are oblivious to the problem. I can try to get their attention, but it’s always really difficult…
For now you could transfer the lesson to Chinese Traditional on LingQ and study it there → 3 dots top right when editing → change course.
Or use a traditional → simplified converter. The best one is probably https://opencc.byvoid.com/ but any other will probably do as well. You have to click ‘regenerate lesson’ first to get access to the full text.

Atlan · April 26, 2023, 8:20am

Danke schön!

mark · April 26, 2023, 2:54pm

@bamboozled - If Whisper randomly chooses Chinese characters, could it not be directed to use a certain set of characters? We know that users importing into Chinese are looking for Simplified Characters. Can that not be specified somehow?

bamboozled · April 26, 2023, 3:38pm

Absolutely, by using the whisper option ‘initial_prompt’. Providing a prompt that uses the desired character variant helps to steer the model in the right direction. This could either be exposed to users by providing another text field, or done sneakily by LingQ at the backend, e.g. by including some prompt containing simplified characters if the user is importing into simplified (for example: 以下是普通话的句子。). See the answer by ‘jongwook’ (Openai) for more info: Simplified Chinese rather than traditional? · openai/whisper · Discussion #277 · GitHub
This normally works for me, although there is always a chance that Whisper may ignore or forget this prompt after x hours or so. Also note that this prompt can influence the output, so choose carefully.
The initial_prompt option has other uses, for example if difficult to spell names or technical terms appear, you can try to give the model a hint as to what is expected. Here is an example: “The following is a conversation during a Dungeons and Dragons game, which includes NPC names like Zerthimon, Vlaakith, and Mordenkainen as well as place names like Agni’hotri, Tu’narath, and Niam’d’regal.”
Further, if the initial_prompt includes punctuation, it increases the chances to see output containing punctuation.

Another way to solve this is would be to convert the Whisper output to the desired character variant using a conversion software, like OpenCC. I think the ability to convert between them has been requested previously. This will definitely work in all cases, but there may be slight inaccuracies going from simplified to traditional.

The ideal solution for me would look like this:

An extra text field below the “Generate automatic transcript” button allowing users to provide a custom initial prompt. Maybe hidden by default as “advanced option” or similar. Possibly auto-populate it with some sentence in Chinese languages.
Allow users to convert to and from simplified / traditional characters (in general). Have a checkbox under the “advanced option”, set to “on” by default to avoid surprises.

mark · April 26, 2023, 10:36pm

@bamboozled - Ok, we’ll see what we can do there.

RexyYasu · September 11, 2023, 7:11am

Transcribing podcasts can be a hassle, but free tools like Google's speech recognition can make it easier. This guide simplifies the process. Thanks for sharing!

Adonis12 · April 9, 2025, 12:20pm

Hey @andrey_vasenin1 - super cool find with Whisper! I’m the guy behind Audioscrape.com, where we transcribe podcasts for pros, so I’ve been down this rabbit hole too. Your German SPIEGEL test sounds solid - 1.5 hours for 30 minutes on a basic laptop is pretty decent, and punctuation’s a game-changer over other free tools. Loving the excerpt you shared, it’s wild how well it catches the flow.

This thread’s gold - tons of great tips like whisper.cpp for speed (nice one @bamboozled ) and freesubtitles.ai for simplicity (props ericb100 and meddit_app ). I’ve messed with Whisper a bunch building Audioscrape, and it’s solid for DIY stuff, though like JanFinster said, setup can be a hassle if you’re not into coding. jmuehlhans , your uni’s use case with noisy field recordings is exactly why we went for a hosted solution - Whisper’s good, but it struggles with tricky audio unless you tweak it hard. For us, we’re at 92% accuracy on clean podcast audio, and we’ve got it running fast enough to monitor the top 100 U.S. business/tech podcasts in real time. rwsandstrom , your 80-85% on Chinese with the base model tracks with what I’ve seen - specialized terms (like JanFinster ’s medical stuff) trip it up unless you go big with the large model or fine-tune it. bamboozled ’s right - Whisper’s not smart enough to proofread itself, it’s just guessing chunk by chunk. If anyone’s looking to skip the DIY grind, Audioscrape’s built for this - transcripts, keyword alerts, all hosted so you don’t need a beastly GPU or terminal skills. TTom , it’s web-based, so MacOS works fine (no native app yet though). GMelillo , that Google Colab trick is clutch for free power - we lean on similar cloud muscle to keep things humming.

Thanks for kicking this off, Andrey - it’s awesome seeing how everyone’s tackling transcripts! Any of you tried Whisper on multi-language podcast episode yet?

vet8t6z79pc4 · April 9, 2025, 12:28pm

For Korean content, it works well, except for some uncommon jargon.

Also, the latest model (v3 turbo) works much faster and accurate than previous ones.