[Tip] Getting tidy text for YouTube imported lessons

LingQ importer often distorts original transcripts. It doesn’t connect multiple lines that make a single sentence properly and sometimes changes the ending of a sentence with “…”. Sometimes it doesn’t combine those lines and return the original transcription messly cutted.

This is the sample

So, I often regenerate lessons with properly completed sentences.
This is the process:

  1. Extract a script from a video with the tool (I recommend this: https://www.youtube-transcript.io/)

  1. Concat broken lines and make the script neatly with LLM. (I use Gemini 2.0 Flash through API with the script below. I tested GPT 4o-mini and Claude haiku, but they don’t work perfectly.) Conveniently, It corrects some errors in Auto-created subtitles automatically with consideration of context.
You are an expert in restoring punctuation. Your task is to transform the following text into complete sentences while preserving every original word and phrase.

Please follow these steps to complete the task:

1. Convert the input text into sentences according to these guidelines:
   a. Preserve every original word: Do not add, remove, or change any words from the original text.
   b. Create proper sentences with appropriate punctuation.
   c. Form logical paragraphs by grouping related sentences. Start a new paragraph when there's a shift in topic or speaker.
   d. Apply formatting: Use no line breaks within paragraphs and separate paragraphs with a blank line.

2. After completing the conversion, review your work to ensure:
   - Every original word is preserved.

Remember, the key is to improve readability and structure while faithfully preserving every original word and phrase. Pay extra attention to ensure no words or phrases are skipped in the process. What you only can add to the original text are punctuation and new lines.

  1. Regenerate the lesson from the edit lesson page.


Since it does not provide synced “playing sentence underlining”, it is annoying to follow the playing position in a standard LingQ UI. So, I recommend using the one-page scroll layout:

6 Likes

Thank you so much for taking the time to create this. But it is too bad that we have to work so hard to get usable transcripts in LingQ.

Many (including myself) requested the improvement, and the LingQ team answered positively. I’m waiting for the update.

2 Likes

Update:

Just use MasterLingQ - Playlist Importer

It makes complete sentences while preserving “playing sentence underlining” audio-text sync with single click.


It is NOT free.
I don’t have any stake about this project.
@roosterburton manages the project, and If you have any question about the program, visit his discord channel.