Chinese Word Splitting Using AI

BassmasterJJ · February 5, 2025, 3:54pm

Would it be possible to just stop trying to force AI to split Chinese words properly?

This is a solved problem. Here is a library that does the job quite well:

Here are some pretty basic sites that do it, also:

https://mandobot.netlify.app/

We don’t care whether it uses AI or not. We just want it to work.

Thank you.

zoran · February 5, 2025, 11:17pm

Thanks for your feedback and suggestions. I’ll forward this to our team.

Atlan · February 6, 2025, 4:17am

It seems that after the first disaster of doing that with Japanese, the same happened with Chinese, which was just fine for me before. The splitting was reasonably good and now it sucks horribly, producing an enormous number of agglutinated nonsense words, inflating the unknown new words. I have been avoiding to study Chinese and focusing on other languages meanwhile but one temporary solution that I saw with the guys studying Japanese, who faced the same problem, is to edit and regenerate the lesson. It is an extra step but it seems to better recognize the splitting.

It is great that you gave a proper example with a technical solution and I hope they can implement it soon.

STCHEN13 · March 25, 2025, 2:48am

Looking forward to seeing improvement in this!

scrubtaku · March 26, 2025, 3:09am

i don’t think they really care tbh they pretty much said that they think the Japanese is fine even after constant feedback that its not. I just gave up and im dealing with it so i dont think Chinese will be any different. What used to be 5% unknown words is now like 20% so its just easier to adjust to it or else you will go insane.

Atlan · March 26, 2025, 8:46am

I am regenerating all lessons. Then they become “normal”

scrubtaku · March 27, 2025, 1:05am

yeah i was doing that too but then i just stopped because it was annoying. I requested a automatic regenerate all lessons with AI but im not sure if thats ever going to happen