Japanese Splitting has suddenly turned Terrible?

Sorry about that. It turns out open ai was down for hours this evening so that functionality was not working as a result. Can you try again now or tomorrow just to confirm if that was the cause or not?

3 Likes

如果修复了这个错误,那么请把批量拆分课程的开关也放出来,现在这个功能没有了

1 Like

I just tried it again with a few different lessons and its still the same.
image

1 Like

It seems to work well for me. I tried importing one of Sayas podcasts and I think the splits are as good or better than after re-splitting. There wasn’t much difference in the number of blue words and LingQs although the splits are definitely different in places. There are a few particles that I probably wouldn’t have added but it also groups characters together that I prefer. Splitting is subjective and there is no right way. It is working well now. We will see if we can tweak it a bit but it is generally very good now.

1 Like

如果可以的话,考虑一下我的建议,不要分割片假名的词汇,这个非常重要

Can you post a screen shot of your results? (a screen shot of the lesson open?)

I have a known words count of 19k in Japanese and normally these lessons would be in the 5%-7% ranges since i have studied these types of content quite a bit so i know for a fact there aren’t this many unknown words.

I’m not sure if you study Japanese but for me its still much different than before the changes. Its still attaching a bunch of grammar to all the words which is why the unknown word count is so high. This is making it appear that there are new words when its words i already know but just with grammar making it a new lingQ.

It used to never do that this much. After re splitting it becomes how its supposed to be but on its own its pretty bad.

I would personally be okay if all the lessons just re split with AI automatically upon importing if that would be possible. I would just like it to be a 1 click import like before without having to fix the splitting.

I think this is a big issue because most people probably have no idea that you would need to edit the lesson and click split with AI for it to be correct

3 Likes

So i figured something out that may finally be the solution to the inflated word count as well as LingQ importing things as blocks of text that others have mentioned in the past as well.

If i import a lesson as normal with the plugin its almost always with no line breaks or spaces between sentences. I think whatever change that was made recently messed with the formatting of subtitles when importing.

I found out how to manually get subtitles from YouTube and imported them into a subtitle editor program that i have, then exported them as plain text (without time stamps). The program automatically adds spaces between lines and here are the results importing the exact same text


(this is how the program i use formats the text and how lingQ used to do it)

Screenshot 2024-12-13 101748
(ignore the picture on the test lesson lol its the same text as the 2nd lesson)

As you can see the New words % on the test lesson is how it should be and i never used resplit with AI

If i were to take the same subtitle file and export it with time stamps as a SRT,ASS or VTT file and import into LingQ then it messes up the format and causes the issues.

Importing just plain text completely bypasses this issue so im almost certain all that needs to be fixed is something that the importing is doing when it comes to formatting

Rooster a popular user on here who seems to be tech savy has comfirmed the same thing here LingQ Import not working for days - creates one block of text - #2 by roosterburton

For others that would like a temporary fix i made a bit of a guide

I hope this helps fix this issue once and for all

3 Likes

Since it was reported several days ago that there were some fixes, I’ve tried importing from Netflix several times.

At first, the format was nice, showing each character’s spoken words line by line, but then it started to be one block of text again.

Occasionally, there will be a sudden chunk of English even though the characters are speaking in Japanese and that part has Japanese subtitles. It’s only happened once or twice, but it’s quite bizarre to see that.

The word splitting from the initial import is still not great, and while re-splitting helps, it’s still an extra step that takes a lot of time.

The text circled in red is highlighted as one word…

Also, when importing in Chrome, the importer extension is no longer showing the title of the series.

This means that unless I label it myself after I import it, there is no way to know which episode it is.

I’m just lightly testing this stuff, because I still have other series that I imported before this word splitting problem happened that I can study. After noticing what I did recently, I figured I should come in here and say something. Maybe I will try importing again after I get through the series I’m studying now, which may end up taking until the end of this year.

2 Likes

i have imported a ton of stuff and been testing and i can assure you its not fixed. The last reply we got was from mark saying it was fine even though its totally not so im just hoping they still are still making any sort of effort to fix it.

I love LingQ but bugs are seeming to become more and more of an issue and if they just ignore them then its gonna drive people away. The experience is just plain worse than it was before the changes and i actually love the idea of the changes but only if they work.

EDIT so it seems 4 days ago they said they are still working on it here - LingQ Import not working for days - creates one block of text - #27 by zoran

Fingers crossed because this has been such a pain

1 Like

Hey Eric i know you study Japanese with lingQ, do you know if there is any progress on the fixing the formatting? i did some tests that i detailed here

Also Rooster talked about this a month ago and i just found it which confirms what i found out

1 Like

我终究还是选择先使用language reactor,如你所说,lingq的错误实在太多了,以及解决错误非常不及时,你如果也恼错误也可以使用这款应用,希望等我用了一段时间后,lingq的开发人员能够修复网站内的错误,现在开发人员正在往好的方向推进,尽管进展比较慢

2 Likes

Yeah i have heard of Language reactor, there is also Migaku that does very similar things but i have already purchased a lifetime subscription to LingQ Japanese so if possible i wouldn’t want to move. If things get worse then i guess i would have to

1 Like

你可以尝试一下Language reactor,他们的ai嵌入到了句子解释里面,不用用户自己查词典了,ai的搜集信息能力非常强大,百分之90的句子,语法都能解读出大概的意思,省了非常多的查询词汇的精力,以及词汇,可以自动识别到日语中动词的原型,添加到间隔学习系统中,lingq的日语目前全部都是变形以后的,没有识别成原型,这个网站有非常多lingq值得学习的功能,lingq目前的进度来看,还很难走到那一步,开发人员太忙了,能把错误修复完都谢天谢地了

2 Likes

As of right now the problem seems to be fixed. Just imported a couple of different lessons and the splitting is back to normal. Fingers crossed that it stays fixed :crossed_fingers:

4 Likes

Glad to hear, thanks for your feedback @scrubtaku

2 Likes

No problem and thank you for addressing the issue!

2 Likes

Since I last posted, the only thing that has been fixed is the format. When I import lessons from Netflix, it’s not one block of text anymore, which is a good thing and I hope it will stay that way.

Unfortunately, the other problems I reported still remain.

In the screenshot below, the extension isn’t automatically importing the title and episode number, it just says Netflix no matter what series I’m watching. This series and one more both have the first several lines in English instead of Japanese.

And the word splitting has not improved.

I did do a couple of lessons after this word splitting problem began, and although I could tolerate it for those, it’s not something I would want to do for every lesson, especially if there are many new words that I need to separate from the ones I already know. There has always been a problem with a particle attached to a word sometimes, but I never really had a problem with it. However, these large chunks of text being counted as words and having to edit more than normal is not something I want to do.

I’m almost done with a series I imported before this problem began in November, and it looks like I will be doing another one I imported before November.

2 Likes

Yeah i just imported a lesson right now and the uknown word count is insanely high once again. It was working just a day ago :smiling_face_with_tear:

1 Like

As a native Chinese speaker, my personal feeling for the perception of Japanese should be that the composition of Japanese kana and words and phrases is relatively complex, and computer AI processing is not simple and prone to errors. This is a big challenge for engineers to invoke AI.

1 Like

My impression is that humans are quiet good at developing a feeling for where one word ends and another one starts. Otherwise Japanese people probably would have added white spaces to there texts a long time ago.

@mark @ericfromlingq So wouldn’t it make much more sense, instead of solely relying on artificial intelligence, to utilize the real intelligence of the people using the app and allow them to perform the splitting themselves? And in a proper fashion, of course. So not via editing the lesson but simply by clicking between two symbols while holding down one or more modifier keys to split or merge words.