Known word goal

This isn’t very scientific but I had a netflix movie folder with a ton of lessons in Korean that i pre uploaded probably close to 350 lessons. Which gave me a 120,000 blue words without and lingqs made this was before I started linging but just uploaded a bunch of content I think if slavic are 3 times romance languages korean has to be around 4 because I have the same kind of netflix movie folder in spanish and its about 1/4 of the blue words so just as a ballpark korean is probably x4 while slavic are x3 romance x2 etc. How much time would copy and paste take if you had to guess? Also I was looking at top english learners on the site idk how but the top people have 80-90k known words in english they must be reading some very academic level stuff with some unique vocab. These numbers aren’t really seen in any other language but idk really.

I’m not 100% on the time, but I think that it is best to use something with the exact same text originally, like the who is she story, some of the mini-stories maybe, but I think you are on the right track.

Speaking on your Spanish movie folder, would you by any chance happen to have or know where I could get the epsidoe subtitles to import for the show “Hotel de los Secretos?” This new LingQ import extension was no invented yet and now that show is not streaming in any country as it has been replaced by the Spain version of the show called “Grand Hotel.” I’m a look for that show and a small handful of other movies that have left Netflix. I would be truly amazing if I could find them. Thoughts?

I’m sorry I don’t think I have any pure spanish movies in that folder it’s just everything I wanted to watch from netflix in one folder I’m sorry never heard of hotel de los secretos my spanish is terrible still needs alot of work. But I don’t know where you would find those movies if not on netflix.

There’s a difference between your average best-seller and “high literature”. I agree with t_harangi that you can expect to read round-the-mill novels at 30,000 known words in Romance languages. Translated novels tend to be easier because there are not so many idioms or culture-related situations,.
[Incidentally, that is why I mostly avoid translated best-sellers. Plus they defeat my main goal, which is getting to know a new culture and society throuhg its language.]
The equivalent word figure for a Slavic language is about twice as much, in my experience. This observation agrees with OP’s claim that “Advanced 2” in Slavic languges is not as advanced as in Romance.
However OP’s proposed figures, IMO, are also too high. I can read regular novels unassisted in Russian at my current level. However for really hard ones I still need to look words up.
As I have explained in other threads, I think that you need about 20,000 family words to read challenging literature. You get that at over 40,000 Lingq words in Romance Languages (a bit more, maybe 50,000) and at over 80,000, , let’s say 90,000, in Slavic languages, certainly below OP’s proposed 140,000.
Of course, there will still be many words you don’t know but you can expect to follow challenging novels. Nevertheless, there are still differences in difficulty even within this group. Consider someone trying to read James Joyce’s “Ulysses”, e.g.
OP is reading “Master and Margarita” in Czech. Well, I read it here on Lingq in the original Russian and I can assure you that it is a hard read, even compared to serious literature. In contrast, “War and Peace” is way easier. Besides a huge vocabulary it contains a lot of irony, special expressions and you really have to read between the lines. At the time, I used to read ta part of this extensive synopsis t after I read the original fragment to make sure that I hadn’t missed anything important: The Master and Margarita Summary | GradeSaver

On the other hand, the effort it takes to read this novel is absolutely worth it. It’s one of the best pieces of literature I have ever read and it is terribly funny as well.

I’ve come to the same conclusion as ftornay in that the known word count necessary in Russian is about 2x that of Spanish. If Czech’s numbers are similar to Russian, then I’m guessing 80k-90k is a pretty decent goal. Also, languages with a different script/grammar than you’re used to can add another level of complexity. The script doesn’t matter so much if someone is reading consistently, but those long breaks can cause some frustration for a few days.

One thing that I did to compare language word counts was to download the top 50k most common word forms compiled from subtitles ( Wiktionary:Frequency lists - Wiktionary ) and import the text file into LingQ so that I can see how many of those words I’ve encountered and have learned. Be careful because those lists contain proper nouns, etc. I’m going to go through the lists and delete all proper nouns one of these days and then I’ll have clean lists to compare. I’ll post the Russian and Spanish lists when I clean them for other people to use. The lists contain the number of occurrences of each word so you can convert them into a coverage %. I wish LingQ would add a language coverage % stat, but I understand that it could be subjective. Here is a sample of what I have found:

% English Spanish French Russian German Mandarin
90 2618 5384 3890 9912 4032 5494
95 6794 12495 9567 20350 10425 12218
96 8668 15353 11980 23907 13176 14965
97 11503 19295 15441 28278 17097 18769
98 16148 25000 20772 33727 23010 24336
99 25075 33950 29942 40740 32624 33264

The % is coverage you could expect if you knew the most common X number of words (e.g. if you knew the most common 12495 Spanish words then you could expect to know 95% of spoken language). In reality, we won’t know “the most common words” but a mix of common and less common words and definitely not in the most efficient order. Like I said, this list currently contains proper nouns and maybe some other nonsense that needs to be cleaned up before it’s more useful. I haven’t thought too much about it, but I’m wondering if a 50k word list is too small for highly inflected languages. That might make the % too high for a language like Russian if lots of “popular” words aren’t in the top 50k. I think this is the case, because of my 44k known words on LingQ, only 25k are within the top 50k. As an example, here are some statistics related to how I’m doing against these words lists:

		               Russian	Spanish		Russian ratio of 50k	Spanish ratio of 50k

words read in LingQ 981696 429272
known in LingQ 44044 18862
known from 50k 25132 16358 0.50 0.33
yellow from 50k 8039 5751 0.16 0.12
blue from 50k 16829 27891 0.34 0.56

Up to this point, I’m assuming that my % coverage from the first table is based on the number of known words from the top 50k list (e.g. ignoring my known words that are not in the top 50k). This ends up being a decent assumption because it makes up for the “inefficiency” in not knowing the most common words in the most efficient order.