Correlation between all-time counting stats

I was just looking at my Italian stats and I noticed that my all-time stats in all major categories (LingQs created, words known, words read, hours listened and even coins) all currently begin with the number 4. Is there supposed to be this kind of ‘decimal correlation’ among the stats or if not, what is an optimal correlation (if one exists)? Just curious.

I love finding patterns in numbers too, but I imagine the ratios would vary based on many factors including how close or far your L2 is from your own, if you’ve learned languages before, what activities you do to learn both with and outside of LingQ, how you define the various levels of lingQs, etc. I for example lingq a lot of phrases to recognize common collocations and turns of phrase. I often leave them at lower levels (status 2) so they remain highlighted. Here are my stats for Spanish, for reference. I probably have listened for more hours than listed here, because while try to track my listening hours outside of lingQ, I sometimes fail to record them.

1 Like

All good points, cheers. Interesting how the ratio of LingQs to known words is quite similar for both of us (1.10 for me and 1.18 for you). Also interesting that you have reached 80% of my known word count while reading only 46% of what I have. Well done!

1 Like

Here is my data for correlation


But the ratio of known to lingqs is different :melting_face:

2 Likes

Throughout my 1.5 years of learning Korean (complete beginner), the ratio of LingQs created / known words has always been around 1:10.^^

3 Likes

That is interesting. I feel like I get a lot of words “free” due to cognates with English. I see however that you have experience with Spanish, Portuguese and French. I would think that your rate of known words / words read for Italian would top mine based on that. When do you mark words known? I mark individual words “known” when I am confident that I understand them, which does not mean I can actively recall them. Is that a difference? Do your other languages have a “decimal correlation”?

It does appear that I’ve created a lot more lingQs / words read which tracks with my having less relevant language experience. :slight_smile:

1 Like

Not really. I have looked at my stats in the main languages I am interested in Swedish, German and Slovak and they´re completely different based upon the fact I knew them differently before I started, and the fact Slovak is about much more difficult for me to learn. However in Danish I have very few lingqs due to Swedish. I will concentrate more on Danish and German soon and Danish will be like lifting 2kg weights at the gym whilst German will be like pulling 15kgs. That being said once you learn stuff in a Slavic language almost all the stuff you need to learn the first time makes it really easy to learn another Slavic language.

3 Likes

When do you mark words known?

If I come across a blue word that is a derivative of a word I already know well (e.g. a different conjugation of a verb, a noun in the plural etc), I mark it known. If I come across a LingQ 1 that I feel I pretty much know, I mark it as LingQ 2. I review my LingQs 2 and 3 using flashcards, so if I can reproduce it then, it will eventually be moved to known.

Do your other languages have a “decimal correlation”?

Just looking now. In Portuguese I have 32k known words but only 23k LingQs, with 2 million words read. In Spanish, which I already knew at a decent level (B2-ish) before starting it on LingQ, it’s 31k known words and only 13k LingQs and only 1.3 million words read, so it was obviously easier/quicker to get to a higher known word count. I haven’t really properly used French on LingQ.

1 Like

You can tell a little bit about the ratios, but not too much. My guess from your statistics is that you have previous experience with another Romance language or have studied/study Italian outside of LingQ. Alternatively, you could be very liberal in marking words as Known.

This is my guess, as your statistics are very different from mine. I have 48k lingQs and 22k Known Words with 4M words read and 1,000 hours listened. You have double the amount of Known Words, while only 40% of my listening hours.

Your lingQ to Known Word ratio changes as you become more advanced, but the general rule of thumb is that if someone has a low lingQ to Known Words ratio, it means they probably know a very similar language (which has lots of cognates) or they started studying this language before using LingQ (or study a lot off LingQ).

There is no one, single ‘optimal’ ratio because it changes, depending on many factors, such as what language they are studying, known similar languages, what level they are in the language, how they use LingQ (eg. how liberal in marking words as Known), whether they started learning the language before LingQ, whether they study the language away from LingQ, etc.

3 Likes

More raw data. :slight_smile:

Enjoy!

2 Likes

My guess from your statistics is that you have previous experience with another Romance language or have studied/study Italian outside of LingQ. Alternatively, you could be very liberal in marking words as Known.

Your first guess is correct! I would say I’m not liberal in marking known words. 42k known words in Italian is a bit crazy when I think about it, as I would rate my speaking and understanding at about B2, even though that number of known words would suggest a more advanced level. I also listen outside LingQ and am not always great at manually adding the hours, so it should be more than 400 but I couldn’t say how much more with any degree of accuracy.

1 Like

It isn´t crazy at all, those aren´t unique words. Considering how many conjugations exist for just one verb in Italian, I am pretty sure a B2 level in Italian would look like 80-120,000. From a perspective of verbs (lets leave out other parts of speech for now), lets just stick with the basic verb forms and say Italian has 8 for each verb. 40,000 divided by 8 = 5000 (I know it isn´t 100% like this). Most univeristy students in a foreign language have a vocabulary of about 10,000 unique L2 words (nouns, verbs, adjectives etc). Considering place names, names etc will probably account for a good chunk in lingq, 10,000 unique words not including names and places will probably be about 80,000-120,000 in most languages. With names, it would be represented as about 90-150,000 depending on what you´ve read.

In my opinion, which I have come to by looking at basic dictionaries in many languages where they easily have 30,000 words, to reflect a native speaker´s passive vocabulary on Lingq, imagining you were removing the names and places, it would be represented by at least 150-300,000 lingq words. This is not including slang/abbreviations/dialect/old words etc. If you included that (impossible to avoid in lingq) it would be much much more.

Interesting article which explains much of the difficulties in counting words - Shakespeare Vocabulary Chapter 911.pdf (cmc.edu)

1 Like

It isn´t crazy at all, those aren´t unique words. Considering how many conjugations exist for just one verb in Italian, I am pretty sure a B2 level in Italian would look like 80-120,000.

I don’t have any data to offer so take all this for whatever it’s worth. While I believe that the 32,500 known word count (or whatever it is) on LingQ to become ‘advanced 2’ is an underestimation, I think your numbers are an overestimation. I’m fine in saying I’m not advanced yet, but the idea that I need to double or triple my current number of known words just to reach B2 doesn’t pass the smell test for me as I think I’m already there.

1 Like

You are at B2 level, when you pass a B2 exam. You can find free reading and listening comprehension exams online. I passed one for Italian with 13k LingQ Known Words, 1.5M words read, and 470 hours listened (I had mainly studied on LingQ and, when not, I added the stats - details in the thread):

Obviously, it depends how you study and what languages you know, etc. But that was just my case study.

I don´t think you and your account are the same? Lingq can only categorise words a user has happened to put into that account. Your account and you are quite different things.
To accurately represent what a B2 student knows, the numbers I have stated are pretty certain. From my own experience, not even to mention the fact I teach people how to pass uni acceptance exams at the level you´re talking about… I have 58k words in Slovak and nowhere near B2, yet I am at 30k words in German and far better than in Slovak. I´m at 99k words in Swedish and still haven´t found a lot of the words I know from courses in Swedish years ago, and I know this because I put in old texts from them from time to time out of curiousity.
I currently take a course in Swedish where I am, and out of curiousity I also put in words/text from the course book at B1/B2 level, they frequently show about 100-200 new words. This is whilst I already have 99k words.
If you continue to use lingq you´ll see for yourself.

Benford’s Law:

2 Likes

Languages vary greatly in number of words needed to reach whatever level.

Every step forward is a step forward. :slight_smile:

I think we agree? “10,000 unique words not including names and places will probably be about 80,000-120,000 (on lingq) in most languages”. Isn´t that essentially the scope or you think it differentiates much more? I would also argue that it is nowhere near the full picture.
I have the same books in multiple languages and they consistently have about the same word count. Brief History of Almost Everything by Billy Bryson - 21000 unique words Swedish, 30000 Slovak - however Swedish has phrasal verbs that don´t show up on Lingq and Slovak has prefixes added to verbs to create a phrasal verb like effect. Then there is a very important detail which is often missed - some languages like Swedish have some words which can mean about 9 things depending on the context (om). Lingq does not register om as meaning more than one thing. As far as Lingq is concerned, it is one bit of information in the word count, but it is actually far more complex than that.
Some examples Swedish to Slovak-
om - ak
köra om - jazda okolo
om jag hinner - ak budem mať čas
jag pratar om - hovorím o
jag tycker om - užívam si
jag gick om - prešiel om
satte liksom lås om det han sagt - tack trochu uzamkol to, čo povedal
vara nära om att göra ngt - byť blízko pri robení niečoho
Han var næra om, att vinna, - bol blízko k viťazstvu
vara neder om ngn, - pohoršovať sa nad niekým

The fact remains that if you are a B1 learner of Swedish, you will be aware that om can mean more than one thing, but you will not realise how many things it could mean because some of them are not high frequency phrases in most learning material. That will come at about C1 level and will get more clear at C2, if at all. Most Swedish people don´t consciously realise how many things om means.

When it comes to languages, culture is also a massive issue. Universally if you put English into a translator for Slovak, Slovak will come out with a much smaller translation. That is because Slovak is far more efficient due to cases at inferring information. But in reality, I have never seen a Slovak contract shorter than an English one, quite the opposite, because culturally Slovaks are far more pedantic.

2 Likes

That’s one of the reasons I don’t put stock in known word counts. When should you mark a word like “om” known? There are layers to so many words. Between the broad definitions what it means to “know” a word, and varying opinions as to what it means to achieve a given level of proficiency it’s like the drake equation. That is to say you can get such a wide range of results based on the differing assumptions and definitions, that it doesn’t add value beyond the examination of the input parameters themselves.

That’s why I now ignore the known words count and focus on words read/written and hours of listening/speaking. If I get to 4+ Million words read and the corresponding stats on the others, and I can use the language as I need to, I won’t care what my words known count is any more than I do in my L1.

2 Likes

That is a valid point. In my opinion if a user understands the main meanings of a word, that is enough to mark it as known. For example “dispose” in the context throw away. How long would it take for someone to realise it has the meaning of employ someone (dispose of)? There are multiple meanings of dispose and you´ll likely first meet it at A2-B1 level, then by the time you get higher to perhaps C1, realise it means something different when of is next to it.
At the end of the day what you´re saying is fair, but it also has a weakness in that time under load is only one manner people learn. There are people who process information better and don´t need so much exposure to grab a concept, or they are taught better, or better inclined (French learning Spanish compared to English learning Ukranian). In English if you have learned Latin you´ll understand concepts in English far better than if you just speak English (hence the Anglish movement - “Why not create our own logical words compared to abstract loan words?”). If we took the idea that we should only “know” words when we actually know every meaning of them… we´re f-ed in our native languages too. I would say then “to the level you would accept you knew a word in your native lanague”. I know what geology means, I didn´t think what it means in parts until I saw the Anglish word for it and realised that is what Latin is doing.
I do think however wordcounts are very useful. For example in my case I am going through multiple languages, and I can see some interesting effects of wordcount. For example with Swedish I can see exactly how many translations I have needed, compared to Slovak. When my translation rate in Slovak goes down I can start to see that I have really made some progress.
Being able to make lingqs is a sign too. In Slovak I have more lingqs than I have known words. I have needed 71989 translations in Slovak with 58,000 words and 30,000 in Swedish with 99k words. In German I have 30,000 known words and 17000 translations. This is a really good indicator of my ability in those three languages. I am much better at German than Slovak, but my word count doesn´t show it. I have much better at Swedish than German and it makes me see to which degree (and often wonder if it wouldn´t be better the other way round!).
Now I can make some plans with lingq - for example get to a point where I don´t need to use so many translations in Slovak.

1 Like