Did the word splitting algorithm change for japanese?

In the latest lessons that I imoprted to Lingq there’s words being split in places where they previously werent being split.
For example, alot of verbs in て-form is now split like 言っ and て when this previously were considered one word.

This change is bothersome because it messes up my timeseries-statistics of my progression since now, the total word count for a lesson before and after the change is not comparable.

4 Likes

Yes! Please fix this.

Yes, we did update the Japanese splitter. We think it will be better but I believe there are still some processes running in the background so let’s wait until tomorrow to see if this resolves the issue. I will update this thread tomorrow.

@job86,helmer - Just to confirm. Are you saying that the previous splitter used to consider 言って as one word?

Yes, that’s correct.

Yes, it’s suppoed to say “is now split like”.

The new splitter doesn’t seem like a big improvement in its current form.

As an example, it wants to split:
物 + が + たくさん + ある

into
物がたく + さ + ん + ある

2 Likes

Any progress on resolving this issue? I would say it’s not merely “not a big improvement”, but considerably worse than it was before.
Another thing I noticed is it does a similar thing with adjectives, with 正しかった becoming 正しかっ た.
I think the spacing will be especially misleading to beginners, who use spacing to figure out where words end and begin. I hope it can be fixed as soon as possible.

1 Like

Yes, we have been working on this. We hope to have the update ready tomorrow. Thanks for your patience.

Thanks for quick reply. Lingq is an amazing service and I really appreciate the effort you all put into it ^^

2 Likes

Yes, it is so time consuming trying to re split my own material… There is almost no situation where っ て occurs in Japanese. Definitely not enough to justify every time seeing a っ to split it.

This should be fixed now. Please take a look and let me know. Sorry about that.

1 Like

Thanks a lot, works fine again.