Japanese word borders

Are the japanese word borders a bit off or is it just my computer/ the texts im reading?

The Japanese word splitter works well for the most part but it does get things wrong occassionally. You can always highlight the characters you want to connect in one word and the system will treat it like a phrase so it will be in your list for review and look up the hints for it.

The word splitting is done automatically by an algorithm. It is not always accurate. Personally I find it quite adequate for the purpose of reading, then listening and acquiring vocabulary via our system.

Note. I rely on my dictation function but sometimes forget to check the results. My apologies.

I´d be great if the author of a lesson could split the words manually. That would make learning much more comfortable. It´s okay if you use texts with Kanji and know basic Japanese, but studying Hiragana lessons as a complete beginner was almost impossible for me.

Good point Paule.


I like Paule’s idea as well. Is it technically feasible to allow someone to manually override the automatic word splitter and define word boundaries seen by other users? I’d be willing to go through some existing lessons and manually split them, as I’d love to contribute.

I imagine developing an interface to do this would take considerable development effort, but I’d be just as happy to mark up .txt copies with some sort of “separating mark” and send them to you (granted, depending on how your system works it could take lots of work to make this useful as well…).

Obviously, you guys have your own development priorities - I just figured it couldn’t hurt to make the offer. Can’t wait to see what you guys add next!

1 Like

@okanoshita - It isn’t possible to adjust word boundaries in the present system. This is something in our wishlist but I can’t tell you when we might get to it. What is possible is to manually add spaces between words which will mean word boundaries are correct. This will mean the text will have spacing which is a bit unnatural perhaps but at least the word boundaries will be accurate. There are already lessons like this in the Library. If you would like to add spacing, we can make you an editor for Japanese and you will be able to get started right away. Just let us know.

@mark - If you guys prefer more of the Japanese content to be spaced, I’m happy to do it. Sign me up!

@okanoshita - Ok, you’re set up now. You should be able to edit all Japanese lessons so you can add spacing if you like. I don’t think most people will mind since it does make the word boundaries more accurate. Thanks a lot!

Sorry to resurrect an old post - personally I think that this site does a good job of parsing the text - certainly sufficient for the aim of increasing vocab. Japanese parsing is notoriously hard, and in my experience applications like this tend to just say ‘it’s impossible’ and don’t have Japanese function (I’m thinking specifically of software used for word counting or content analysis), so I really appreciate the effort that has gone into the Japanese part of this tool, thank you!

@jankensan - You’re welcome. It has been tricky for sure. We hope to get time to improve it further at some point but it is, in fact, pretty good already.

Wouldn’t manually adding spaces to Japanese content cause issues when LingQ’ing a phrase? I mean, yeah, you can still create the phrase but for it to show up in other lessons, the spacing would need to be added again. Most issues with the word splitter seem to be when it splits words up too much into non-words. Adding spaces wouldn’t fix that. More people creating LingQs that include both the word root+affix (I think those are the correct terms) would probably help more.

@cgreen0038 - You’re probably right. I never found myself wishing Japanese lessons had spacing in them. But, it seems some people do find it helpful especially in the earlier stages. Some day we would love to build a tool that allows the adjusting of the automatically split word boundaries.