Recent Updates to Known Word Thresholds

Here is an updated version for the Known Word Thresholds - user-estimate - also called ‘guesswork’ for most languages in LingQ:

Thanks to those who have provided additional data points - this should have improved the estimates for some languages noticeably.

Green figures: Should be accurate (taken directly out of LingQ or from Forum Info)
Black figures: Extrapolated - based on limited data
Grey figures: Extrapolated - based on limited data, with lower degree of certainty

1 Like

It’s really interesting to see how some of the languages changed (or didn’t). For instance, Greek used to be at 30,250 for Advanced 2, the same as English. Now it seems to be closer to the highly inflected languages like Russian. I’m also surprised how much higher Advanced 2 is in Ukrainian than Russian. Granted, it’s currently an extrapolation, but even the Beginner 1 number is considerably higher.

Italian Beginner 2: 2160
Spanish Int 1: 8040
English Beg 2: 1500
English Int 1: 6,000

Thanks. These seem to be pretty close to the estimates in the table (or similar for English).

Indeed. It would be interesting to get some comments on the reasoning behind the new figures for Greek, Russian and Ukranian from the LingQ Team at some point (provided the differences remain in principal with the final figures).

Previously there were only 3 broad categories that were relatively closely together.

Whereas, for the new figures ‘nsprung’ has stated that

“the LingQ team has been looking at the number of unique words used in some of our basic LingQ-produced courses.”

These unique words counts do show a considerable range for different languages (-> explains why there are some drastic changes). To what extent the LingQ Team has used additional expert judgement, I do not know. One advantage of the Mini-Stories is, that they provide a unique words count for almost all languages. However, there might be some differences in the way they had been translated (some a bit simpler to make it easier for beginnner, others perhaps in a somewhat more demanding way).
Another issue is, that these courses are all beginner material. Therefore, they may not be as representative for the word counts of higher levels.

This is not meant as a critique on the method or the LingQ team, but merely a comment on the inherent difficulties on such an endeavour.

The difference between russian and Ukrainian is probably because Ukrainian has more case endings for nouns. In Russian you only have one ending per gender/case combination, where in Ukrainian you have different declination groups as well.

Compare this:

to this:

I’ve had a similar experience with Danish

51425 is Advanced 2 for Polish

Nice. 11800 is Int 1 for Turkish; that’s my next level.

I can confirm Dutch beg 1 = 560 as you have in the table.

I did not know Ukrainian has seven cases! That’ll do it.

Thanks!

Italian Advanced 2 is 43,560

Thanks - this brings Polish to:

Italian is:

Beginner 1 - 720
Beginner 2 - 2,160
Intermediate 1 - 8,640
Intermediate 2 - 17,280
Advanced 1 - 29,880
Advanced 2 - 43,560

I found them out by going to one of the Italian challenges and opening the profiles of several people and checking their ‘all time’ stats.

Probably harder to find on the smaller languages, but you can still fill in a few more entries this way.

Thanks for your community effort. :slight_smile:

Thanks - these Italian figures do match my latest (yet unpublished) estimates exactly. So I won’t need to collect all figures to be reasonable close/certain what the numbers should be.
However, there seem to be some outliers, so checking individual figures ist still useful. I will post an update (probably) tomorrow.

Unofficial Known Word Thresholds for most languages in LingQ:

Green figures: Confirmed
Black figures: Extrapolated

Comments:

  • Russian: Somewhat smaller distance from Adv 1 to Adv 2 in % (compared to other languages)
  • Bulgarian: Beg 1 appears too low, relatively to the other threasholds
  • Indonesian, Cantonese and Malay: Distances between levels (in %) are somewhat smaller for the lower levels, compared to other languages. Apparently these languages use the middle category of the ‘old System’
  • All other languages seem to follow a similar pattern (regarding the distances between the levels in %). Therefore the extrapolated figures should match or at least be very close.

2 Likes

This is excellent work. You’re a champ! Thanks for this!

Additional Languages:

1 Like

Das ist großartig. Wir haben hier eine tolle Gemeinschaft, danke!

I have a similar feeling in french as far as ability, but I’m OK with the levels, because it’s about potential. At LingQ A2 we have acquired a vast set of tools and a large pile of materials. At this point is all about using those tools and continuing the glide towards become a master at our new craft. I took a placement test in German a few days ago and had the reverse experience of high school. I completely bombed using the 47 forms (j/k) of the word “the”, but when it came to the listening comprehension I felt strong. In high school (in french), I’d nail the conjugations but have no idea during the listening segments, you know the actual part that matters. Keep it up, I no longer study my A2 language, I live it.