How many words before you're comfortable?

Hey guys, I’m currently on my way to the 6000 word milestone in spanish which i’m quite pleased with, and before I ask your opinion on this i’ll share my definition of “knowing” a word. Simply that you are able to read the word in whatever text and know what the word means on it’s own and possibly how the words around it may alter the meaning.

My question is (and I am wondering this especially because i’m approaching 6000 words) How many words would you say one must be familiar with before it is rare for you to read a passage and find more than maybe 10-15 words you don’t know or haven’t seen before.

Now I know this question is hugely general, and obviously depends what the text you are reading is about. But I mean in a general not too formal newspaper or equivalent level of Harry Potter type book. Any shared opinions are welcome. I’m not looking for very specific ideas, as the question isn’t very specific. Just a general thought/opinion/idea/guess.

I’m getting close to a “comfortable” level in German here at Lingq at around 11000 words reading not too complicated literature (I’m reading Süskind, not Thomas Mann). I figure when I get to 15 to 17 thousand words I should be really comfortable with this kind of material and should move on to more challenging texts.

Now that I know over 27000 words in French on LingQ, I am comfortable with daily news of Radio France International or my favorite subjects. I always ignore many specific words or typos in order not to add them in my statistics list. However, I am not really comfortable with old literature or those I am not interested in.

when you say you’re not comfortable reading stuff you’re not interested in I take it you mean, you find it harder to concentrate because you’re not interested what you’re reading?

It depends wether you know these words active or passive. Passive words help you read, but without active words you will not be comfortable by speaking.
I know 25,000 words in French and I read Jule Verne etc, but I’m not comfortable when I’m speaking French.
And I know only 1000 Polish words and I make a lot of mistakes, but I feel comfortable to speak Polish.

I fully agree with Evgueny that it makes a huge difference whether your language knowledge is active or passive or both active/passive. This was always my big hinderance with Turkish. In the 1990s I attended VHS Turkish courses for 4 years and then from 2008 on I studied Turkish again for 3 years, but all I accumulated was passive knowlwdge of the language, because I never had the chance to activate it. We only spoke German in the VHS - lessons all the time.

I am very happy that in my VHS Danish - course only Danish is spoken so that I have the opportunity to use Danish once a week. This makes me think in Danish, which helps me also to write in Danish in my log on HTLAL.


In Russian I’m close to 60 000 known words and I still have to read texts slowly and carefully. But I think I made a basic error with Russian: I moved to advanced texts too early on. So a lot of my known words are obscure and rarely encountered (1 star on the vocabulary page). If I had stuck with intermediate 1 and 2 texts for longer I would have got the reinforcement I needed on the 4 and 3 star words, which might have got me to reading fluency at 30 000 known words. If I were doing it all over again I wouldn’t even bother learning 1 star words until I was quite advanced.

In French about 25 000 known words gave me reading fluency for Jules Verne, so my experience there is similar to Evgueny’s.

I’m around 30,000 passive/reading words in Russian, counting by the LingQ method, which is roughly around 8000 word roots maybe. Any new text I import generally has about 10% brand new words, and maybe another 10% “yellow” words that I have read before, but still don’t know even passively.

I can’t read a news article without a dictionary, and literature is too hard going, even kids stories are still a challenge. By contrast, I can understand audio transcripts of un-scripted interviews (from radio interviews) much more easily, maybe 90% comprehension.

I think the lingq word count has one particularity that we should all be aware of, and that is that it doesn’t differentiate between the several inflexions of a word. For example, the different conjugations of a verb, plurals, case endings etc. When you hear a benchmark vocabulary size like 8,000 words for spoken fluency, it means 8,000 word families. That means all conjugations of a certain verb and so on and so forth only count once. When you use a corpus software, it only counts a verb once for all its different conjugations. So, for example, if your word count here at link in Russian or any of the other heavily inflected languages includes multiple occurrences of the same word in different inflections or if your Portuguese word count includes multiple instances of the same verb conjugated in all the different times, then what you get is an unreliable gauge of your actual vocabulary size, even though it still serves its main purpose, which is to measure your development here.

Here’s a link to a great video from Professor Arguelles about vocabulary size.

@SkyblueTP: “…If I were doing it all over again I wouldn’t even bother learning 1 star words until I was quite advanced.”

I think this would be exactly the right approach. I don’t (alas) know anything about Russian, but I have a copy of the Routledge “Russian Learner’s Frequency Dictionary” on my desk; in the introduction the author (an academic at SSEES) makes the claim that:

“Any foreign student with a sound knowledge of Russian grammar and a passive knowledge of 8,000 to 10,000 vocabulary items (with perhaps an active vocabulary of half that) can reasonably call him or herself competent in the language for all normal purposes”

He is, of course, referring to high frequency words, and (I imagine) to ‘infinitives and nominatives’ - not to the mighty myriad of conjugated and case-inflected forms?

(That’s one of the slightly freaky things about LingQ’s stats: in theory one could “know 1000 Russian words”, and yet in actual fact have a vocabulary of not much more than 100 words! :-0)

Yes, I agree with what is being said about different verb conjurations of the same verb throwing off the reliability of the “known words” total, all though this is certainly handy for irregular verbs which in my opinion should count as a known word in their own right as usually they are completely different, or "irregularly"different at least.

going slightly off topic though. I’m currently reading Harry Potter in Danish, and I’d say 65-75% of the time there are at least a few words in a sentence of which I can determine the meaning of the sentence even if the rest of the sentence has unknown words, and maybe 40-50% if the time I know every word in a sentence. I often wonder if I only understand what’s going on because I’m so familiar with the film though, you guys have any idea how to work out if that is the case?

Corin, prior knowledge of the book’s plot may well be helping you - but I reckon you will still end up learning a very considerable amount this way. My experience of foreign reading (mostly in German) is that one can remember a lot of things (sometimes even whole passages) long after putting the book down.

It may be the case that, when we are reading for interest or pleasure, things tend to go striaght into the longterm memory?

mmm, interesting jay. I am reading a long with an audio book in danish. Not sure if that helps more or not.

It has actually been proved through scientific research that you learn better when you’re having fun, so that’s probably the case when you’re reading a great book in a foreign language.

In Arguelle’s video that I linked to, he mentions different kinds of word “knowledge”, that is, there are words that you know which are in your active vocabulary, words that you know passively, words that you understand because they’re similar to your own language’s equivalent, and words whose meaning you can tell by the context. Those can all be considered “known” words, at least in the context you encountered them.