Japanese Splitting has suddenly turned Terrible?

so unfortunate the problem seems to be back

2 Likes

I started to use LingQ for importing Netflix series in 2021. Sure, I’ve had some trouble importing lessons from time to time before, but whatever the problem was, it would usually be resolved in a timely manner. It was never bad enough for me to come and post in the forum about it. Since 2021, I can’t remember any major issues with word splitting.

Since this thread started back in November, the splitting still hasn’t gone back to the way it was before. I really don’t feel like trying to import a new series just to keep dealing with this problem. That’s why I’ve been studying lessons I imported before this happened, because I can study enjoyably and effectively, even if I have to correct a little spacing or omit a particle, it’s still better than this recent word splitting problem.

Of course, my supply pre-November imported Netflix imports will run out eventually, but I would hope by that point the splitting would finally be fixed.

3 Likes

Yeah it was something in the last few months that changed because i have imported hundreds of lessons of all sorts since 2019 and this never happened

2 Likes

Sorry to hear you are still experiencing the same problem. I’ll check with our team what’s going on here.

3 Likes

Yes, it’s quite bad. Thanks for your follow up with the admins on this,

2 Likes

Any update on this? It’s not really a matter of which source shows the issue, any Youtube video has essentially 70-80% of the words split in meaningless way.
I’m giving a try to the AI splitter now (in the edit lesson section), it seems to work better.

3 Likes

Thank you. The block of text issue seems to be fixed which is progress. I thought that once that was fixed it would also fix the splitting issue after my tests of manually importing text but somehow the splitting still remains the same.

1 Like

Yes, something like this would be great, but, unfortunately, it’s not that easy. We do need to go to an editor and to regenerate that part of the lesson when edits have been made. That is, in effect, what can be done using the Edit Sentence view in Sentence Mode. Although, not as easy as you would suggest, it is relatively convenient for obvious edits. Realistically, having done it when learning Japanese, you are unlikely to do this for more than the more blatant issues you come across. That is why we feel like AI is the most likely way to get this optimized. Keeping in mind that people do have different preferences for how words get split.

1 Like

Here is page 3 of 4. Let me know your thoughts. It is this video https://www.youtube.com/watch?v=XnbNLVwF7f8.

1 Like

第9行的(実際に)、助词(に)应该和别的词汇分开,如果作为完整的词汇导入间隔复习系统,会出现很多带有(に)的重复词汇

倒数第三行的(でも)和(いい)应该组合在一起,这是一句语法,也是惯用语。です这种常见的助词,单独为一个词就好了,完全不需要和别的名词组合在一起。

第四行的片假名那个片假名单词需要合并,之前我提过建议了,片假名单词完全是独立存在的,分割只会分割出和文章完全无关的词汇,在ai单词分割没有出现的时候,我也没觉得未分割的片假名有任何问题,很多片假名词汇作者在小说中是作为虚拟人名或虚拟地名使用的,所以所有的片假名词汇都不需要分割

最后一行的の は需要合并,这是一句语法

1 Like

“も しこう” should be “もし こう.”
This is completely wrong.
(“こう” is used while thinking, like “kind of” or “you know.”)

“気が合う” should be “気 が 合う.”
“可能性が高い” should be “可能性 が 高い.”
“ことができる” should be “こと が できる.”
“よくある” should be “よく ある.”

These are recognized as units because they appear as such in dictionaries, but they correspond to what would be called phrasal verbs in English, or simply collocations.

If there is a debate about whether to write them together or separately, I think it’s better to write them separately.
Those who prefer to join them can create their own phrases.

“でもいいです。” can be “で も いい です.”
“っていうのは” can be “って いう の は.”

I’d like to add that my opinion might be off the mark, since I’m not using LingQ to study Japanese, as it is my native language.

3 Likes

你也说了你没有使用过lingq学习过日语,我不同意分开,如果完全分开,首先,会增加人工干预的次数,需要手动创建更多的短语,学习会更费力,。
你是日本人对于短语和语法没什么感觉,但是对于外国人来说,面对句很长的句子,是很难感知到日语中的短语和语法的成分的,特别是短语,分割出来会马上明白他是短语或者语法,从而查询,理解句子中的意思。

2 Likes

Non-native speaker here and I agree 100%.