Japanese Words are Broken Up incorrectly

JenneferJ · March 22, 2012, 5:51pm

Hi

I am having a hard time learning the Japanese vocabulary because the words are being broken up incorrectly in the color coding. For example the word for years (like in years old) is sai. But the color coding separates the two sounds and forces sa and i to be two different parts which makes learning that sai is years old very very difficult.

This is only one example, there are many words that have this problem. Is there a way that I can tell the color coding that it is incorrectly chunking the words? And especially to put the chunks together so they are a complete word?

I wrote this using romanji characters so that someone who doesn’t read Japanese can help. The characters for sai are さい if you need to see. And the lesson in question is:

Thank you.

JenneferJ

nobody · March 22, 2012, 6:07pm

Hi JenneferJ,

As you know, most Japanese naturally has no spaces between the words. This means that whenever we process text on LingQ, we have to employ an automatic word splitter to enable you to save individual words. Unfortunately, the word splitter is quite complex and does sometimes incorrectly split words. We have plans to continue to improve the word splitter, but no matter how good the splitter is, it’s impossible to be 100% accurate as language is never 100% consistent

JenneferJ · March 22, 2012, 6:19pm

I understand. Is there a way I can tell the program the words are split differently? I don’t necessarily need the word splitter to be correct. I would like to correct the word splitter.

Thank you.

nobody · March 22, 2012, 6:28pm

At the moment this isn’t possible, but we would like to at some point allow users to manually adjust word boundaries and it is on our to-do list. Thanks for your willingness to help!

nobody · March 22, 2012, 6:39pm

I know it’s not a complete fix but if it highlights part of a word instead of the whole thing, you can highlight (manually) the whole thing, and it will form a lingq. The only problem is if there’s an extra character where there shouldn’t be.

To demonstrate what I mean, if it appears as “一緒” you can highlight the two kanji and LingQ will form a lingq as “一緒”, but if it appears as “一緒に” and you just want “一緒” that isn’t possible currently. When this happens, I usually set my hint to how I would have liked it to appear along with the English. In this example, I would put the hint as “（いっしょ、一緒）1: together; 2: at the same time; 3: same; identical” so that I get the pronunciation, the kanji for the pronunciation that I’ve written (which may not be exactly the same as the text from the lesson) and the English meaning all in one go.

Another thing to note is that if you do this manual highlighting, it will only update that occurrence of the word until you re-load the lesson. What I try to do, is if a word comes up a few times in a lesson, I’ll fix it once, then reload the lesson and carry on.

I hope this helps.

nobody · March 22, 2012, 6:55pm

Thanks for the tips, Lyise!

Shigeharu · March 22, 2012, 7:00pm

@alex and all the staffs of LingQ
If you at some point allow users to manually adjust word boundaries, it will also be helpful for us all the learners to make lingqs in Chinese lessons.

JenneferJ · March 22, 2012, 7:19pm

Thank you Lysie! Your method helped fix my problems.

JenneferJ

nobody · March 22, 2012, 8:28pm

I’m glad this helped! I’ve been doing a lot of lingqing in Japanese over the last few months, so I worked out a method that worked for me as I went along.

erin09 · March 23, 2012, 12:16am

I having been having a similar problem with Korean. Lyise’s tip is a good one to know but most of the time I wish that less of the word/phrase was highlighted instead of more.
It really seems that allowing users to manually adjust word boundaries would be helpful to all those learning Chinese, Japanese, AND Korean. I hope that this priority is higher up on the “to do list” rather than lower.

nobody · March 23, 2012, 12:25am

The issue with Korean is a bit different, since Korean is an agglutinative language. As there is natural spacing in Korean, we don’t need to run the text through a word splitter.
In studying Korean here on the site, I know all too well what you mean, but my advice is to not spend too much time worrying about the grammatical endings on words, otherwise it’ll take you forever to save LingQs

I typically look at the root word or the verb then make the hint based on that, so “쓰다”, “써서”, “썼는데”, “쓰고”, etc. will all have the same hint. It makes it a lot more efficient, and you see the grammatical endings so much that you eventually just figure them out. In any case, good luck!

Jens · September 24, 2012, 1:24pm

Hi! I was wondering if there is any update on this issue as I have the same problem in Chinese/Japanese.

I like the simple solution in Pleco Dictionary/Reader where you can use arrow buttons to shrink or expand the word boundary manually to the left or right. See for example screenshot [1].

Cheers,
Jens

[1] http://www.pleco.com/manual/images/palmscreens/readerwithhighlight.gif

mark · September 24, 2012, 9:45pm

@Jens - We have not updated that issue although we do like that solution and do plan on implementing it someday.

dooo · September 25, 2012, 1:15am

I was wondering why there couldn’t be a way of just getting feedback on how users break up words and using that to make the splitting algorithm smarter. For example if 一緒に is split 一緒に but most users are overriding that split with 一緒に then the algorithm will correct itself accordingly.

nobody · September 25, 2012, 2:37am

@dooo - That’s essentially what we hope to do, but of course we have to build the mechanism to handle this first