I have a dude. When I try to import any video of Youtube the transcription is very bad if you compare with import just the audio MP3 of original video YouTube.
If I import directly from Youtube the phrases are cut, the pauses are horrible, instead when I import MP3 the phrases are perfect and you can read perfectly because the audio is join at text.
when you import from youtube, it takes text from youtube subtitles. When you import mp3, text is made by Whisper AI. That’s why so big difference in quality
I’m having a lot of success getting chatgpt to clean up the subtitles. If you edit lesson and use ‘regenerate lesson’, you can copy the plain text in and out.
Here is the simple prompt I’ve been using:
The following is the auto-generated subtitles from a youtube video. Can you reformat the transcript to remove line breaks and tidy up the sentences. As this is an transcript that will primarily be used as a language learning resource, please don’t change or simplify the text, just clean up the formatting
I added the qualifiers because sometimes it was tidying up parts where the speaker was repeating themself or meandering.