Since Japanese (and others languages) doesn’t use spaces between words, when you import a book into LingQ, the parser tries to detect the separation between words. Very often it separates them incorrectly, and when you read an imported book in LingQ, you need to edit many sentences to separate the words correctly (I edit sentences every day!). I usually do this with ChatGPT (I ask it to break down words when I have doubts).
Please, @zoran could you guys please consider if it’s possible to implement something similar to the ‘Translate with AI’ function? For example, have an option in LingQ to “re-parse” the import? This could involve having an AI System reviewing the text and add spaces to correctly separate the words in each sentence.
The language I want to learn with LingQ is Japanese, but there are other languages where they don’t use spaces (like Chinese), and they will likely have similar issues.
@zoran I believe a new option has been enabled to re-parse a lesson with AI, right? Yesterday I received a notification about this when I entered a new lesson. How does it work? From which menu option can I access this feature to re-parse old lessons? Very happy about this addition, I am looking forward to trying it out
There also does not appear to be a way in the UI to re-split an existing lesson with this new ichimoe AI technique. Code below if anyone wants to re-split their old lessons in the mean time.
Can someone explain what algorithm or system is used to segment words in Japanese on LingQ?
What is used by default for imports and what for the “Japanese Word Split Optimizer”?
First it was said that machine learning was going to be used, specifically ChatGPT. And the popup says “Optimize with AI.”
But now I read “ichimoe” which points to https://ichi.moe/ which in turn appears to use GitHub - tshatrov/ichiran: Linguistic tools for texts in Japanese language
This doesn’t seem to be related to machine learning at all and use a more traditional lattice based tokenization like MeCab.
Thanks.
This weekend, I tried for the first time a Japanese lesson in which I used the new AI-optimized word parsing. The parsing has improved significantly; I didn’t have to re-parse any sentences and the reading was much smoother. Super happy with this change!
@zoran are there plans to make this option more accessible? I want to optimize old lessons that I had imported and opened in the past, and I believe the option only appears when you enter a lesson for the first time.
Nippongo dewa tango wo kanji mataha katakana de arawasu koto ga ooku, tashika ni tangogoto no space(外来語はそのまま使う) ha tsukai masen.
Tada rome-ji hyouki shita baai ha, space wo tsukau koto ga arimasu.
Kono bunshou mo rome-ji hyouki ni shite mimashou.