Here is an updated version for the Known Word Thresholds - user-estimate - also called ‘guesswork’ for most languages in LingQ:
Thanks to those who have provided additional data points - this should have improved the estimates for some languages noticeably.
Green figures: Should be accurate (taken directly out of LingQ or from Forum Info)
Black figures: Extrapolated - based on limited data
Grey figures: Extrapolated - based on limited data, with lower degree of certainty
It’s really interesting to see how some of the languages changed (or didn’t). For instance, Greek used to be at 30,250 for Advanced 2, the same as English. Now it seems to be closer to the highly inflected languages like Russian. I’m also surprised how much higher Advanced 2 is in Ukrainian than Russian. Granted, it’s currently an extrapolation, but even the Beginner 1 number is considerably higher.
Indeed. It would be interesting to get some comments on the reasoning behind the new figures for Greek, Russian and Ukranian from the LingQ Team at some point (provided the differences remain in principal with the final figures).
Previously there were only 3 broad categories that were relatively closely together.
Whereas, for the new figures ‘nsprung’ has stated that
“the LingQ team has been looking at the number of unique words used in some of our basic LingQ-produced courses.”
These unique words counts do show a considerable range for different languages (-> explains why there are some drastic changes). To what extent the LingQ Team has used additional expert judgement, I do not know. One advantage of the Mini-Stories is, that they provide a unique words count for almost all languages. However, there might be some differences in the way they had been translated (some a bit simpler to make it easier for beginnner, others perhaps in a somewhat more demanding way).
Another issue is, that these courses are all beginner material. Therefore, they may not be as representative for the word counts of higher levels.
This is not meant as a critique on the method or the LingQ team, but merely a comment on the inherent difficulties on such an endeavour.
The difference between russian and Ukrainian is probably because Ukrainian has more case endings for nouns. In Russian you only have one ending per gender/case combination, where in Ukrainian you have different declination groups as well.
Thanks - these Italian figures do match my latest (yet unpublished) estimates exactly. So I won’t need to collect all figures to be reasonable close/certain what the numbers should be.
However, there seem to be some outliers, so checking individual figures ist still useful. I will post an update (probably) tomorrow.
Unofficial Known Word Thresholds for most languages in LingQ:
Green figures: Confirmed
Black figures: Extrapolated
Comments:
Russian: Somewhat smaller distance from Adv 1 to Adv 2 in % (compared to other languages)
Bulgarian: Beg 1 appears too low, relatively to the other threasholds
Indonesian, Cantonese and Malay: Distances between levels (in %) are somewhat smaller for the lower levels, compared to other languages. Apparently these languages use the middle category of the ‘old System’
All other languages seem to follow a similar pattern (regarding the distances between the levels in %). Therefore the extrapolated figures should match or at least be very close.
I have a similar feeling in french as far as ability, but I’m OK with the levels, because it’s about potential. At LingQ A2 we have acquired a vast set of tools and a large pile of materials. At this point is all about using those tools and continuing the glide towards become a master at our new craft. I took a placement test in German a few days ago and had the reverse experience of high school. I completely bombed using the 47 forms (j/k) of the word “the”, but when it came to the listening comprehension I felt strong. In high school (in french), I’d nail the conjugations but have no idea during the listening segments, you know the actual part that matters. Keep it up, I no longer study my A2 language, I live it.