Japanese Splitting has suddenly turned Terrible?

for me its all imported Youtube videos and whisper audio generated lessons. I could link one but im not sure what difference it would make.
Here is a reply i had on another thread that details the issue more with screenshots

Here is a link to the video if that helps at all https://youtu.be/c8jRlJ8uuLY?si=vU1VQyP_4le6ziEh but again its all YT imports and whisper generated text where this happens for me personally.

2 Likes

scrubtaku’s link demonstrates what I’m seeing very well.

Personally, I’ve only imported from Youtube so far. I’m very new to lingq, so, to be honest, I’m not sure if what I want is what the system is designed to provide.

I very frequently see common Japanese particles like が, に or と being connected to the words in front of them (or sometimes even the following word). Also, the copula です or だ is almost always connected to the word in front of it. This creates a lot of fake “new” vocabulary, which is kind of annoying to deal with.

2 Likes

There has always been a bit of that with lingQ. Im not sure if they can ever make it perfect but now its way worse.

1 Like

For me, the issue is importing lessons from Netflix. I’m using the LingQ importer extension in Google Chrome to import the subtitles from Netflix. I’ve tried importing 3 different series, and all 3 of them had unsplit text. I’m assuming this is the issue regardless of which series I try to import.

I don’t remember the date this started, but I guess anything I imported before November… 6th? has not been affected.

2 Likes

Hey everyone, thank you for bringing this up. This is an issue we take seriously, and we are actively working on a solution for Japanese word-splitting that will resolve it soon. Our goal is to provide the most accurate splitting possible in a reasonable period of time. Our previous implementation was too time consuming. Unfortunately, this latest implementation is unstable but we are confident that the end result will achieve our goals. We are trying to get something out in the next few days that should resolve the issues. Thanks for your patience. We will post here when the new updates are live.

3 Likes

We have pushed some fixes to importing Japanese. Can you all try importing again and let us know if things are improved? They should be. Let us know any issues and also let us know how the splitting could be improved. Using AI we should be able to make it work better than it ever has.

2 Likes

I tried importing 3 different lessons and its pretty much the same as before
image
After re splitting with AI
image

EDIT i just tried it also with whisper ai generated Audio imported lesson and its the same result

1 Like

Sorry about that. It turns out open ai was down for hours this evening so that functionality was not working as a result. Can you try again now or tomorrow just to confirm if that was the cause or not?

3 Likes

如果修复了这个错误,那么请把批量拆分课程的开关也放出来,现在这个功能没有了

1 Like

I just tried it again with a few different lessons and its still the same.
image

1 Like

It seems to work well for me. I tried importing one of Sayas podcasts and I think the splits are as good or better than after re-splitting. There wasn’t much difference in the number of blue words and LingQs although the splits are definitely different in places. There are a few particles that I probably wouldn’t have added but it also groups characters together that I prefer. Splitting is subjective and there is no right way. It is working well now. We will see if we can tweak it a bit but it is generally very good now.

1 Like

如果可以的话,考虑一下我的建议,不要分割片假名的词汇,这个非常重要

Can you post a screen shot of your results? (a screen shot of the lesson open?)

I have a known words count of 19k in Japanese and normally these lessons would be in the 5%-7% ranges since i have studied these types of content quite a bit so i know for a fact there aren’t this many unknown words.

I’m not sure if you study Japanese but for me its still much different than before the changes. Its still attaching a bunch of grammar to all the words which is why the unknown word count is so high. This is making it appear that there are new words when its words i already know but just with grammar making it a new lingQ.

It used to never do that this much. After re splitting it becomes how its supposed to be but on its own its pretty bad.

I would personally be okay if all the lessons just re split with AI automatically upon importing if that would be possible. I would just like it to be a 1 click import like before without having to fix the splitting.

I think this is a big issue because most people probably have no idea that you would need to edit the lesson and click split with AI for it to be correct

2 Likes

So i figured something out that may finally be the solution to the inflated word count as well as LingQ importing things as blocks of text that others have mentioned in the past as well.

If i import a lesson as normal with the plugin its almost always with no line breaks or spaces between sentences. I think whatever change that was made recently messed with the formatting of subtitles when importing.

I found out how to manually get subtitles from YouTube and imported them into a subtitle editor program that i have, then exported them as plain text (without time stamps). The program automatically adds spaces between lines and here are the results importing the exact same text
image
(this is how the program i use formats the text and how lingQ used to do it)

Screenshot 2024-12-13 101748
(ignore the picture on the test lesson lol its the same text as the 2nd lesson)

As you can see the New words % on the test lesson is how it should be and i never used resplit with AI

If i were to take the same subtitle file and export it with time stamps as a SRT,ASS or VTT file and import into LingQ then it messes up the format and causes the issues.

Importing just plain text completely bypasses this issue so im almost certain all that needs to be fixed is something that the importing is doing when it comes to formatting

Rooster a popular user on here who seems to be tech savy has comfirmed the same thing here https://forum.lingq.com/t/lingq-import-not-working-for-days-creates-one-block-of-text/804778/2?u=scrubtaku

For others that would like a temporary fix i made a bit of a guide
https://forum.lingq.com/t/big-block-of-text-issue-and-a-temporary-fix/857166?u=scrubtaku

I hope this helps fix this issue once and for all

2 Likes

Since it was reported several days ago that there were some fixes, I’ve tried importing from Netflix several times.

At first, the format was nice, showing each character’s spoken words line by line, but then it started to be one block of text again.

Occasionally, there will be a sudden chunk of English even though the characters are speaking in Japanese and that part has Japanese subtitles. It’s only happened once or twice, but it’s quite bizarre to see that.

SuddenEnglish

The word splitting from the initial import is still not great, and while re-splitting helps, it’s still an extra step that takes a lot of time.

The text circled in red is highlighted as one word…

ThisIsOneWord

Also, when importing in Chrome, the importer extension is no longer showing the title of the series.

ImportTitleNotShown

This means that unless I label it myself after I import it, there is no way to know which episode it is.

I’m just lightly testing this stuff, because I still have other series that I imported before this word splitting problem happened that I can study. After noticing what I did recently, I figured I should come in here and say something. Maybe I will try importing again after I get through the series I’m studying now, which may end up taking until the end of this year.

2 Likes

i have imported a ton of stuff and been testing and i can assure you its not fixed. The last reply we got was from mark saying it was fine even though its totally not so im just hoping they still are still making any sort of effort to fix it.

I love LingQ but bugs are seeming to become more and more of an issue and if they just ignore them then its gonna drive people away. The experience is just plain worse than it was before the changes and i actually love the idea of the changes but only if they work.

EDIT so it seems 4 days ago they said they are still working on it here - https://forum.lingq.com/t/lingq-import-not-working-for-days-creates-one-block-of-text/804778/27?u=scrubtaku

Fingers crossed because this has been such a pain

1 Like

Hey Eric i know you study Japanese with lingQ, do you know if there is any progress on the fixing the formatting? i did some tests that i detailed here
https://forum.lingq.com/t/japanese-splitting-has-suddenly-turned-terrible/802715/27

Also Rooster talked about this a month ago and i just found it which confirms what i found out

https://forum.lingq.com/t/lingq-import-not-working-for-days-creates-one-block-of-text/804778/2?u=scrubtaku

1 Like

我终究还是选择先使用language reactor,如你所说,lingq的错误实在太多了,以及解决错误非常不及时,你如果也恼错误也可以使用这款应用,希望等我用了一段时间后,lingq的开发人员能够修复网站内的错误,现在开发人员正在往好的方向推进,尽管进展比较慢

2 Likes

Yeah i have heard of Language reactor, there is also Migaku that does very similar things but i have already purchased a lifetime subscription to LingQ Japanese so if possible i wouldn’t want to move. If things get worse then i guess i would have to

1 Like

你可以尝试一下Language reactor,他们的ai嵌入到了句子解释里面,不用用户自己查词典了,ai的搜集信息能力非常强大,百分之90的句子,语法都能解读出大概的意思,省了非常多的查询词汇的精力,以及词汇,可以自动识别到日语中动词的原型,添加到间隔学习系统中,lingq的日语目前全部都是变形以后的,没有识别成原型,这个网站有非常多lingq值得学习的功能,lingq目前的进度来看,还很难走到那一步,开发人员太忙了,能把错误修复完都谢天谢地了

2 Likes