Improvements to Chinese and Japanese

Hi all,

Last week we released some updates, and we’re putting the finishing touches on some other updates over the next short while that will come in handy for LingQ members who are learning Chinese or Japanese.

The first round of updates specifically targets the splitting of text, and these changes are now live on the site.

  1. We updated the Chinese parsing algorithm, so any Chinese lessons which were previously uploaded without spaces will now use the new algorithm. Based on our tests, this new algorithm seems to be a lot more effective at detecting context and combining individual characters into multi-character words.

  2. There is now a “manual override” option for both Chinese and Japanese. When you upload a lesson, it will automatically be split on the lesson page into individual words. However, the splitter is only triggered when you first upload a lesson.
    After uploading the initial version, if you notice the text is split incorrectly, you can go back to the Edit Lesson screen and manually add spaces where words need to be split. This will override the splitter for that section of the text, and the lesson text will be updated to show the new spacing.

*Note that some Chinese lessons which were previously completed will now have blue words again, as the new algorithm splits the text differently. However, we hope that this new algorithm works better to identify natural words, and gives our Chinese learners a better experience.

Some additional updates are coming soon which will make Chinese and Japanese much more accessible for new learners, and will also enhance the learning experience for existing learners.


I was already surprised by the amount of new words I noticed on lessons I had previously gone through.

The algorithm was in dire need of improvement, so I really hope this does the job. I will share my experiences in a few days.

Thanks for being on top of it, as you promised a few weeks back.

Fantastic stuff!

Sounds great, looking forward to hearing your feedback on the new parsing algorithm!

It does indeed seem to parse better. Thanks again

I had some free time so I checked this out.

  1. Yes it appears to work better than the old parser. Nice work!
  2. The splitting works as you say, but I don’t find it useful because I have to go into the editor to use it. So lessons that aren’t mine, I can’t re-split, because I don’t have editing rights. There is no handy fly-over dictionary to check my work in the editor, and it takes time to get in and out of it. Ideally, if I see something wrong in the lesson page, I’d like to be able to fix it in the lesson page. This isn’t a high priority item for me, but I thought I’d let you know why I’m not using it.

Thanks for the feedback on this! If you’re interested in helping correct other lessons just let us know - we’d be happy to give you editor access so you can go into other lessons and correct them, as it benefits everyone :slight_smile:

Some day soon we hopefully will also have a nice editor to handle this, though for now we’ve started simple - so as not to tie ourselves up in knots while figuring out the best way to do this.

Nice updates.

I loved the one which shows you the pinyin together with the characters for Chinese. And you even can turn on or off. Really nice improvement!

True, I think it really helps us remember the words faster, plus there is quite a number of people that don’t actually care about the Hanzi, but just want to learn the pronounciations.

Once this “Spanish thing” is over with, I 'll be choosing another language to tackle. Chinese is in the realm of the possible, so this is a great encouragement. Arabic is on the short list too, but it being still in Beta mode (not really sure what that means) has me on the fence. Fortunately I have time.