I know I don’t count proper nouns as I feel that a name is a basic given. Proper nouns would inflate numbers along with cognates.
I am feeling much more comfortable at 38K. I found some novels by Heinz G. Konsalik which seem to fit this level (30k-ish)
I might not be right but I have always kind of questionEd the number of words known as a way of measuring your level in the language. Simply for the reason that if you mark the word know you can easily forget it. At least that’s how it’s been for me. I think that for me I shoot for words read and hours listened to show my progress.
With any of the numbers it’s difficult to really say what level you’re at and as others have mentioned elsewhere certificates don’t necessarily mean much either.
I think using the known words count is still useful. It measures progress. One just has to know that there is x% of that list that they’ve already forgotten or are very shaky on the meanings. We know there is progress though because we can read and listen to more complicated stuff than we could before.
There’s also a problem when measuring any of these levels…what do they really mean? If I read a lot of scientific books, I could have a large vocabulary, but it may be a lot of words that aren’t useful to everyday conversation. I might be intermediate, or advanced, but not be able to understand a conversation, even written down.
Very true.
I mostly use the known words counter as a means of setting goals. The upside is it really makes me do the work. The downside is my work tends to be too imbalanced, too much reading as opposed to listening, writing and conversing.
See I started this year with Known Word goals, but what I realized is I can’t control when I will learn words. It just kind of happens with enough exposure.
This caused me to shift my goals to reading, speaking and listening in 2021 as follows:
- 350 Hours Listening
- 2.500.000 Million Words Read
- 50 Hours Speaking
One result of this is that I blew past my known word goals that I originally set in half the time, but I can stay “motivated” to keep just doing the things I am enjoying doing. That is engaging in native content in my target language(s).
I think people need a better understanding that Lingq is counting word forms. I know Steve has mentioned it, but I don’t think people who do not study linguistics understands.
Bike, biked, and biking are all the same word form and lingq counts them all. So if someone chooses the words that are within a word form three times they still only know one word. 30,000 word forms might only translate down to 12,000 known words and within those words their might be a high-level of cognates.
Yes!
Personally, I’m not interested in the “Known Words” stats for my L2s. I’m more interested in the overall number of words read, the overall number of hours listened, etc.
But, if someone wants to use the “Known Words” stats for measuring progress, it’s probably best to divide it by 4, 5 or 6 (depending on the language).
In German, for example, you have sometimes between 10-20 different forms for a single verb like “gehen” (to go): “Ich gehe, du gehst, er / sie / es geht, wir gehen, ihr geht, sie gehen.”
And that’s only the present tense… It’s the same in Romance languages, for instance.
Consequently, L2 learners usually know far fewer words than LingQ’s “Known Words” stats indicates. There are also other problems involved in the “Known Words” stats: tens of thousands of collocations, proper names, etc…
LingQ has chosen the “word forms” (rokkvi) approach because this solution is easier to implement. That’s all.
When I started lingq, understanding the tool and the numbers was my main hurdle and a source of anxiety. If made me question myself and my progress.
It wasn’t until I started focusing on the language in stead of the tool, that I had the tranquility to understand the numbers for what they are: subjective, relative tools that measure some aspect of progress. Also, I am no longer so focused on entering the numbers into lingq for listening or reading. What I do outside lingq is simply outside of lingq and not in the numbers.
I feel that I can measure real progress by my understanding of freely available audio, video and articles. There is no number to measure progress in absolute sense, but there is a subjective way to determine how much of a document, conversation or movie you understand. And that is what counts for me. If I can follow, understand and enjoy the content I am more than happy with my progress.
And that is the problem in a nutshell. A beginning lingq user lacks trust in the process and clings to the numbers. I did. I suspect most beginners do. When I started a post about this, I got the extremely good advice to just enjoy the process, enjoy the path of learning a new language. Of course I ignored it (bad advice, right?). Only later did I realize its value and now I mostly ignore the numbers, except for one thing: my lingqs have to increase by at least 50. The known words will follow. Also, I listen a lot to random talking. For some reason that helps.
This is just my opinion. If it helps only one student, I will be happy.
" LingQ has chosen the “word forms” (rokkvi) approach because this solution is easier to implement. That’s all." - very true. Otherwise they would have to have functionality to group words by their base word, so to speak, the ability of users to link the word forms to the base word (or some ways for the program itself to draw upon external resources to do it on it´s own, which is not all that simple, even if the resources exist in proper form) and functionality that would count the base words. This would also mean the word count would be skewed for all the word forms that hadn´t been linked to a base word already.
One thing I wonder about is how LingQ could implement a morpheme prediction algorithm such that it could work with highly synthetic languages. As is, the “Known Word Count” seems to work pretty well for analytical languages that have minimal inflection, and works to varying degrees with synthetic languages.
Type ahead or next word prediction software has similar problems with highly synthetic languages when it has been designed around a concept of “predicting the next word”.
A word/morpheme prediction algorithm like this had to have been implemented for east Asian languages, but I have no experience with that on LingQ so I don’t really know how well it works (or doesn’t work).
“Known Morpheme Count” probably doesn’t have the same ring though.
I love how you have broken everything down into how you have shifted your goals. I also agree it is tough to know when you will learning and be able to use a word.
We do have similar goals!
Listening (Lingq/Netflix) active engaged listening 1,400 hours
Words read 1,500,00- million (I read about 3,00-4,000 plus words a day)
Reading 360 hours (about an hour a day)
Speaking 50 hours (I speak in about 20 minute chunks weekly sometime daily)
I might need to start a money jar or marbles for my goals and watch the money roll in for a trip.
I do not know what total words read will be sufficient. However, 1.5 million words read is in my opinion are at a lower end especially for German. My goal for the moment is 3 million words read and then assess how fluent I will be at that point and then further speed up my reading process. I am very surprised to see that h_harangi was able to pull off c2 level in the official exam with just 1.5 million words read. Either the exam was too easy or the number of words mentioned by t_harangi, was underestimated. Either way, I am far from being a fluent reader in German at 1.5 million words read. More work to do. Certainly, if I ever want to secure a c2 level in the exam, I will aim for 5 million words read. I don’t know maybe my brain needs more input on average.
I would say that 1,5 million words is definitely not enough for a C2 German Reading level. I would guess you start approaching that level around 5.000.000 to 8.000.000 million words read with German. However, in general a C2 level in any area is going to be the most variable as it requires you to constantly challenge yourself for those millions of words read, and it requires you to experience many different types of content.
I am only counting my words on Lingq not words read outside of Lingq if that is of any help. My guess is that t_ only counted Lingq words