Vocabulary size estimation


I have been always interested in vocabulary size estimation and development for various reasons. Since this topic is regularly coming up in forum discussions, I want to share some of the information that I have found with you. Take it with a pinch of salt, since I’m not a linguist :slight_smile:

First and foremost, you should be aware that a statement like “I know 5000 words” is completely meaningless unless you explain exactly what “words” are and how you count them. A few examples to explain the issue: Do you count “post”, “office” and “post office” as two or three vocabulary entries? What about “dog”, “house”, “dog house”? Do you count “(to) run”, “running”, “(he) runs”, “runners” as four words, as two words (the verb forms “run/runs/running” vs the noun “runner”) or as one word (word family “run”)?

In practice, many vocabulary tests do one of the following: (a) count all word forms, (b) only count word stems or word families, or, that’s a very pragmatic definition, (c) count head words in a dictionary. As a result, the outcome can differ by a factor of two or more depending on the chosen method.

There is also the question of what it means to “know” a word. Do you only count words that you can use (“produce”) in a conversation (active/productive vocabulary) or also words that you understand (“recognize”) when listening or reading (passive/receptive vocabulary)? Is it enough to know that the German word “Bub” means boy, or should you also know that it is rarely used in Northern Germany? [1] Do you know the difference between German “(einen Brief) schreiben” and “(einen Test) schreiben”?

Interestingly, receptive and productive vocabulary are nearly identical in size at native speakers (not entirely true for educated native speakers [2]), whereas learners might know 100% more words passively than actively [3].

There are many online tests to estimate the size of your English vocabulary. These ones look more legitimate than the usual buzzfeed or facebook junk:
http://www.lextutor.ca/tests/ (<— Also tests for production!)
Test Your Vocabulary Online With VocabularySize.com – Select a test...
Don’t be surprised if the results vary by several thousand words. This is kind of expected since the tests are rather short and based on statistical estimations. Be careful if you know French since most tests will overestimate your vocabulary in that case.

Finally, the question that everybody asks: How many words do you have to know? There are many, many scientific publications on that question, especially for English, focusing on different aspects of language learning (native vs L2, children vs adults, speaking vs writing, etc.). According to [4], knowing 3000 word families is a good basis for reading comprehension. It means that you know more than 98.5% of the words in a novel for teenagers. A long-term goal should be to know 8000-9000 word families, which is equivalent to recognize 98% of the words in a normal novel [5]. If you have reached that point, you are able to ignore unknown words (or, even better, guess their meaning from context) and still understand most of the text.

And how many words do you need to know in order to speak “fluently”? No idea :slight_smile: But maybe this gives an indication: A 10-year old native English speaker, who undoubtedly is able to talk without much hesitation on a large range of topics, knows around 7000 word families.

[1] https://www.philhist.uni-augsburg.de/lehrstuehle/germanistik/sprachwissenschaft/ada/runde_1/f01/
[2] http://engres.ied.edu.hk/vocabulary/vocabulary2-3.html
[3] http://www.robwaring.org/papers/various/vocsize.html
[4] http://www.lextutor.ca/research/nation_waring_97.html
[5] http://www.victoria.ac.nz/lals/about/staff/publications/paul-nation/2006-How-large-a-vocab.pdf

I don’t mean to be rude but… who cares ? I always like to say the people who learn languages well, are learning them while others are busy reading studies about how to learn them well.

The more we obsess over this word count stuff the less time we have for actually learning words.

FWIW by the way, i’m a university educated native English speaker and have been an avid reader since 3 or 4. According to my.vocabularysize i know more than 4000 word ‘families’ less than needed to be considered a native speaker !

Methinks the test is dubious. Especially as there were only 6 or 7 words which i didn’t know the meaning of.

Perhaps you can recomment some good “classics” written with a limited number of word families?

Which is why I prefer to say, I recognise x number of words, rather than know them. But, the more words you begin to recognise, the greater the progress in the language and the greater the knowledge that will accumulate in the brain. I have absolutely no idea how many words I know, but it is definitely more than I began with!

For popular languages, like English and French, there are tons of graded readers from different publishers (Oxford University Press, Penguin,…). But when you say “classics” you probably mean non-abridged and non-simplified texts?

Maybe the other forum members can help with examples from their native languages?
I found the Girl with a Pearl Earring relatively easy to read in English (because the story was told from the perspective of a young lower-class girl). For French, I vaguely remember that I didn’t have much problems reading Camus’ L’Etranger in school (level B1/B2 at that time?). For Chinese, there is Chinese Text Sampler: Readings in Chinese Literature, History, and Popular Culture

My personal preference is to read things that are of a decent level and that i read as a kid. So i read the Goosebumps series when i was a child and so i read them now in target language. Decent level, not too childish, nothing too complicated.

There are many sentences in my Korean texts for which every word is individually known to me, yet the meaning of the words combined escapes me.

However one chooses to define a word, understanding is so much more than that.

Still, interesting post - I like that you’ve dug up a bunch of interesting research to read! Thanks for the post :slight_smile: