Chinese Word Segmentation Errors When Importing from YouTube — Breaks the Core Philosophy of LingQ

One of the best features of LingQ is how it highlights new words in blue. As you get familiar with them, they gradually level up and eventually turn white, meaning you’ve learned them. It’s a smart, simple way to focus on what you still need to study.

But there’s a serious issue when importing Chinese (Simplified) lessons from YouTube. When you open a lesson for the first time, you see a huge number of words in blue—even though you already know many of them. The problem is that LingQ often fails to segment the Chinese text correctly. Sometimes it combines characters incorrectly, and in some cases, entire phrases are grouped together and marked blue as if they were a single unknown word.

This completely breaks the core idea behind LingQ. The color system is supposed to guide you smoothly through what’s new and what isn’t—but now, you can’t trust it. In one 20–25 minute lesson, I found more than 200 segmentation errors. That’s massive.

On top of that, it forces you to manually edit sentence by sentence just to correct the segmentation and save phrases properly. It turns into a painful, time-consuming task that totally kills the flow of studying. What’s worse is that this issue has been around for a long time, and honestly, I’m really frustrated that it still hasn’t been fixed.
For intance: 但是 不管 如何










this last one has 5 words in a row. This not what LINGQ should be. I feel very frustated each day with this issue because SHOUD NOT BE HAPPENING

Sorry to hear that! I asked our team to look into this. We’ll see what can be done.

I’d made this suggestion a while ago: Manual Splitting for Japanese and other languages without spaces. | Voters | LingQ
The suggestion made by Mark there would obviously be very impractical and tiresome. But if some more people vote for this, they may overthink their attitude.
As you are stating, too, even a language learner is doing a better job then the ai splitting the words. And I would even argue that it is part of learning such languages, so let the user do it manually.

1 Like