This week Mandarin learner and language app developer Karl Baker posted a blog offering his comprehensive breakdown of what you can expect to be able to do at each level of known Chinese words/ characters. I find his guide much more helpful and realistic than CEFR or HSK. Check out the blog below and let me know your thoughts
Interesting article / blog but:
ā5,000 Words (2,000 Characters)
You should be able to read slightly more advanced texts such as modern novels.ā
Thatās wildly optimistic. Iāve found modern novels difficult in their own way since they involve a lot of internet / local slang terms or they intentionally āmisspellā words to get around the auto web censors.
I found 3,500 hanzi was the level I needed to not get hit in the face constantly with unknown hanzi when immersing with native material. (I can not confidently define what exactly a āwordā is Mandarin, so I have no idea how large my vocabulary is.)
I strongly believe in immersing in native material whatever the level, and sticking whatever you want to read into LingQ as long as you have the patience to get through it. Iāve just thrown out the whole āwords /hanzi knownā goals in exchange for # of words read / hours of listening.
Also after browsing through his blog, I would give some advice to people wanting to start reading outside of beginner materials / graded readers. I think a trap is to use translated literature like the Lion, the Witch and the Wardrobe, the Little Prince or Harry Potter since the reader is more familiar with the plot, but I personally donāt recommend it. I browsed through them after reading a lot, and I find the style quite awkward and contains lots of foreign names / loanwords which donāt crop up frequently in native content.
Instead just import >100 stories (yes, that manyā¦never underestimate how much you have to read haha) from gushi365.com. They have sections broken down by age level, so you can pick stuff at varying difficulty. Youāll get a better feel for how the language works (with onomatopoeia and what plants / animals native children are familiar with). The stories are really cute and fun, and Iāve been actually meaning to go back and continue to read more of them.
An interesting topic, thanks for bringing this up. Many learners will probably agree that vocabulary size is one of the best indicators for oneās language proficiency.
The levels Mr. Baker has come up with seem familiar. For example, researchers estimate that educated native speakers of English know about 20,000 word families (Goulden, Nation and Read, 1990; Zechmeister, Chronis, Cull, DāAnna and Healy, 1995). Generally linguists understand words as word families. The research seems to indicate that for adequate comprehension a text coverage of 98% is needed, that is 1 unknown word in 50. See:
(I believe others have suggested a lower 90%-95%)
Measuring your vocabulary can help gauging your general level in a language. But, more practically, one can use it to find appropriate reading material. This is especially important for unaided reading, i.e. without a dictionary; not on LingQ.
You can test yourself here (in English): https://my.vocabularysize.com
In fact I just did: Imgur: The magic of the Internet
I have some questions though:
- Can English word families be equated with Chinese root words (as Mr. Baker seems to do)?
- Do scientific studies on the necessary vocabulary size for adequate reading in Chinese exit?
- How do various sources e.g. news articles, technical texts, novels differ in this respect?
- How large is a native speakerās vocabulary size?
- The HSK frequency list seems to cover only 5000 items (intermediate level?). Do we have more comprehensive lists?
- How many (root)words does a typical Chinese novel have?
- Does a vocabulary size test exist?
- Do I understand correctly that the āMandarin Vocabulary Builderā is one of those tests? I donāt have an Android device, so I canāt try the app.
Donāt really know the answer to this but to enjoy the majority of content, Lingqās Advance 2 word count of 30,250 is a good amount to reach for. Anything more than that is useful as well. I still believe that at 24,000 words, itās not enough.
5000 characters is very good
This post makes me feel quite uncomfortable! I“m studying Chinese for half a year now and my word count goes up really slowly. I“m sure it will increase faster in the future but right now a word count of 30000 words seems pretty far away
Just keep at it! Itās going to take a lot (a lot) of reading, but eventually it clicks. I remember when I first started just getting to 1000 words was a horrendous amount of mental effort, but now Iām surprised at how quick my LingQ count goes up (please note that LingQs arenāt directly correlated to vocab size as the software picks up a lot of names / nonsense combinations).
A vocab size of 25,000 - 30,000 seems accurate though to be read most literature comfortably at the mythical 98-99% comprehension level. I put one of my books into Chinese Text Analyzer after reading that above blog post, and itās got a unique word count of 25k. Granted itās a wuxia / cultivation webnovel, so uses a lot of fantasy / TCM / historical / slang terms, but it doesnāt have many modern words / business / tourist / loan words.
(Iām hoping if I read the equivalent of 10k-20k pages in LingQ, Iāll be able to to read comfortably. It sounds like a lot, but Iām trying to be realistic about how long it takes. But Iām monolingual, so for others it might be quicker.)
Chinese is a vast language (the chengyus!!) and anyone that says it is easyā¦äøē„天é«å°å
Hi all,
Iām the original author of this article. Itās good to see that you found it interesting and I enjoyed reading some of your feedback.
A few answers to some of your questions:
- A good online mandarin vocabulary size test exists here: Online Chinese Vocabulary Size Test
- Thereās also a good test for how many characters you know here: https://hanzitest.herokuapp.com/
- My free Mandarin flashcard app is available to download here: https://play.google.com/store/apps/details?id=spaced.repetition.mandarin.chinese.learning.vocabulary.builder
- There isnāt an IOS version yet but I am working on one
- Can English word families be equated with Chinese root words?
- Roughly I think so yes, because there is no conjugation in Chinese, almost all words can be considered root words
- The HSK frequency list seems to cover only 5000 items (intermediate level?). Do we have more comprehensive lists?
- When I was building my flashcard app, I did some word frequency analysis in Mandarin and came up with another 5 packs of words that are not in the HSK but are equally as common, I named these the āHSK Extraā packs. On the app there is currently HSK Extra levels 1 to 5. The HSK Extra level 1 pack contains the most common 150 words that are not in the standard HSK. The HSK Extra 2 contains the next most common 150 words not in the standard HSK. HSK Extra 3 has 300 words, HSK Extra 4 has 600, and HSK Extra 5 has 1300. Iām currently still working on further levels of the HSK Extra packs but Iāve noticed an enormous benefit from studying them daily with flashcards.
The other questions may take me a bit longer to answer but I remember coming across the answers before, I will just need to look up the exact sources again so Iāll get back to you.
Thanks Karl. In response to some points raised below, I think while the LingQ counter is useful for measuring your progress on LingQ, itās not at all useful for measuring your vocabulary size. This is because a) LingQ is bad at deciding what counts as a word and what doesnāt and b) everyone has a different standard for when they mark a word as āknownā. A more objective measure is to use the tools you mention.
Thatās important because otherwise you might think the vocabulary levels described in the article can be mapped onto LingQ and wonder why youāre still struggling to read novels even though your counter is over 5000. But if weāre talking about 5000 root words, e.g. HSK6 profficiency, then many modern novels are within reach. There will still be unknown words/ characters on every page but few enough that itās not painstaking.
Yes, LingQ is bad at deciding what counts as a word and what doesnāt, sometimes suggesting random combinations of characters as āwordsā but a good solution is to mark those as ignored (click the trash icon) which prevents your known word count from being artificially inflated.
Using this method, my LingQ known word count has always been within the range that would be expected for my level of comprehension.
It would be nice if LingQ segmented the words properly in the first place, but itās admittedly a hard problem to solve for computers and I can deal with it.