Word count issues

Word splitting is very inconsistent in Chinese as well.

For example, every combination of [number][classifier][word] shows up as a separate “word” in LingQ, even within the same lesson.

This means that this simple “one person” phrase generates six LingQs:

  • 一个人
  • 一个
  • 个人

Then, changing from “one person” to “two people” gets you another 3 LingQs. Changing that to “three people” gets you yet another 3 LingQs. And so on. The same thing happens with “some people”, “more people”, “fewer people”, etc.

This leads to a massive explosion of unknown words (and known words) in LingQ that correspond to a relatively very few real words in the language.

I’ve accepted that this is just something I have to deal with in a language that does not use spaces between words.

That said, if LingQ could recognize even that apparently simple pattern above it would do a lot of good.