for me its all imported Youtube videos and whisper audio generated lessons. I could link one but im not sure what difference it would make.
Here is a reply i had on another thread that details the issue more with screenshots
scrubtaku’s link demonstrates what I’m seeing very well.
Personally, I’ve only imported from Youtube so far. I’m very new to lingq, so, to be honest, I’m not sure if what I want is what the system is designed to provide.
I very frequently see common Japanese particles like が, に or と being connected to the words in front of them (or sometimes even the following word). Also, the copula です or だ is almost always connected to the word in front of it. This creates a lot of fake “new” vocabulary, which is kind of annoying to deal with.
For me, the issue is importing lessons from Netflix. I’m using the LingQ importer extension in Google Chrome to import the subtitles from Netflix. I’ve tried importing 3 different series, and all 3 of them had unsplit text. I’m assuming this is the issue regardless of which series I try to import.
I don’t remember the date this started, but I guess anything I imported before November… 6th? has not been affected.
Hey everyone, thank you for bringing this up. This is an issue we take seriously, and we are actively working on a solution for Japanese word-splitting that will resolve it soon. Our goal is to provide the most accurate splitting possible in a reasonable period of time. Our previous implementation was too time consuming. Unfortunately, this latest implementation is unstable but we are confident that the end result will achieve our goals. We are trying to get something out in the next few days that should resolve the issues. Thanks for your patience. We will post here when the new updates are live.
We have pushed some fixes to importing Japanese. Can you all try importing again and let us know if things are improved? They should be. Let us know any issues and also let us know how the splitting could be improved. Using AI we should be able to make it work better than it ever has.
Sorry about that. It turns out open ai was down for hours this evening so that functionality was not working as a result. Can you try again now or tomorrow just to confirm if that was the cause or not?
It seems to work well for me. I tried importing one of Sayas podcasts and I think the splits are as good or better than after re-splitting. There wasn’t much difference in the number of blue words and LingQs although the splits are definitely different in places. There are a few particles that I probably wouldn’t have added but it also groups characters together that I prefer. Splitting is subjective and there is no right way. It is working well now. We will see if we can tweak it a bit but it is generally very good now.
Can you post a screen shot of your results? (a screen shot of the lesson open?)
I have a known words count of 19k in Japanese and normally these lessons would be in the 5%-7% ranges since i have studied these types of content quite a bit so i know for a fact there aren’t this many unknown words.
I’m not sure if you study Japanese but for me its still much different than before the changes. Its still attaching a bunch of grammar to all the words which is why the unknown word count is so high. This is making it appear that there are new words when its words i already know but just with grammar making it a new lingQ.
It used to never do that this much. After re splitting it becomes how its supposed to be but on its own its pretty bad.
I would personally be okay if all the lessons just re split with AI automatically upon importing if that would be possible. I would just like it to be a 1 click import like before without having to fix the splitting.
I think this is a big issue because most people probably have no idea that you would need to edit the lesson and click split with AI for it to be correct
So i figured something out that may finally be the solution to the inflated word count as well as LingQ importing things as blocks of text that others have mentioned in the past as well.
If i import a lesson as normal with the plugin its almost always with no line breaks or spaces between sentences. I think whatever change that was made recently messed with the formatting of subtitles when importing.
I found out how to manually get subtitles from YouTube and imported them into a subtitle editor program that i have, then exported them as plain text (without time stamps). The program automatically adds spaces between lines and here are the results importing the exact same text
(this is how the program i use formats the text and how lingQ used to do it)
(ignore the picture on the test lesson lol its the same text as the 2nd lesson)
As you can see the New words % on the test lesson is how it should be and i never used resplit with AI
If i were to take the same subtitle file and export it with time stamps as a SRT,ASS or VTT file and import into LingQ then it messes up the format and causes the issues.
Importing just plain text completely bypasses this issue so im almost certain all that needs to be fixed is something that the importing is doing when it comes to formatting
Since it was reported several days ago that there were some fixes, I’ve tried importing from Netflix several times.
At first, the format was nice, showing each character’s spoken words line by line, but then it started to be one block of text again.
Occasionally, there will be a sudden chunk of English even though the characters are speaking in Japanese and that part has Japanese subtitles. It’s only happened once or twice, but it’s quite bizarre to see that.
The word splitting from the initial import is still not great, and while re-splitting helps, it’s still an extra step that takes a lot of time.
The text circled in red is highlighted as one word…
Also, when importing in Chrome, the importer extension is no longer showing the title of the series.
This means that unless I label it myself after I import it, there is no way to know which episode it is.
I’m just lightly testing this stuff, because I still have other series that I imported before this word splitting problem happened that I can study. After noticing what I did recently, I figured I should come in here and say something. Maybe I will try importing again after I get through the series I’m studying now, which may end up taking until the end of this year.
i have imported a ton of stuff and been testing and i can assure you its not fixed. The last reply we got was from mark saying it was fine even though its totally not so im just hoping they still are still making any sort of effort to fix it.
I love LingQ but bugs are seeming to become more and more of an issue and if they just ignore them then its gonna drive people away. The experience is just plain worse than it was before the changes and i actually love the idea of the changes but only if they work.
Yeah i have heard of Language reactor, there is also Migaku that does very similar things but i have already purchased a lifetime subscription to LingQ Japanese so if possible i wouldn’t want to move. If things get worse then i guess i would have to