I know everyone loves to quote Paul Nation’s How much input do you need to learn the most frequent 9,000 words? paper. I just reread the paper and I believe there’s a little bit of confusion about it. His recommendation of 11 million words read is required to learn the most common 9,000 word families is a number he just pulled out of thin air. Honestly, I don’t know how he jumped to that conclusion.
In his analysis, to achieve an average of at least 12 repetitions of up to the 9th 1,000 word family level (note: not all word families are actually encountered) took only 3 million words. This is Table 1 in the paper. It is a “running,” as he says, (i.e. accumulative) total. Table 2 clearly shows you will encounter 8,219 out of 9,000 word families with 3 million words read with encountering the 2nd 1000 word family level on average 171 times per word family.
Why exactly he jumps from an accumulative total to separate values needed to achieve that particular word family level is beyond me. He talks about seeing it as “a set of stage steps.” His initial analysis shows that you need only 3 million words, but later in the paper he recommends to read 11 million words. That’s an order of magnitude jump! From memory, one of his previous papers estimates that a native speaker learns about 1,000 word families per year on average, so I guess this is why he decided to do this. But it is a big leap from his initial analysis, so should really be taken with a grain of salt.
I agree, as he says, “there are some serious problems with these crude calculations,” but I still has some qualms.
The reading material he used in his initial (Table 1) analysis is simply unrealistic (including literature). I imagine he close these books because they were freely available online. This unrealistic reading material compounds into several problems:
He isn’t clear, but he may slightly hint to it, but what order does he add his 25 novels? If he added them in alphabetical order, the first novel he added was the 19th century literature Adam Bede. As this book would have many words above the 2nd 1000 word family level, this seems like a waste of a first book to read… This means that by reading more realistic beginner material, such as the LingQ mini-stories, you will gain the 2nd 1000 word family level much faster than his analysis.
This analysis starts counting the exposures of the 9th 1000 word family level from the very beginning, when you start reading Adam Bede. Unrealistic much? This means that the mid-frequency levels would take longer to acquire, because you don’t start acquiring them from page 1. (This brings an interesting thought to mind. Perhaps there is some metric which could be created to look at the value of a book in terms of frequency words? Instead of just ‘unknown words’, an interesting metric to choose material could be ‘unknown mid-frequency words’. Then you can choose material with more mid-frequency words.)
TL;DR All in all, his analysis which results in 3 million words of reading uses unrealistic material and his 11 million words of reading recommendation comes out of thin air. It’s a great idea for an analysis, for sure, but he just could’ve done it in a better way. Due to his “crude calculations” you really can’t derive much from the exact numbers, apart from maybe you are looking in the millions of words read area. Because of this I personally wouldn’t use the exact numbers derived from this paper as your goal. It’s probably more realistic to aim for goals, which are recommended by experienced language learners, such as noxialisrex, PeterBormann, and several others. If anything, aim for 1 million words read as one of your earlier goals and you’ll have a solid foundation in the language (probably around lower intermediate).