Word Count when Learning Russian

I feel like the word count is very distorted for Russian. Every verb has 9+ forms for present/future and past tense. Every noun with few exceptions has at least 6 declensions. Adjectival forms go on and on and on to form different “words” that are merely different forms of the same word.

How about an additional count counting only word stems or base dictionary forms? That to me would be far more revealing.

The majority of languages on LingQ have inflated word counts to varying degrees - I just wouldn’t take it too seriously in general. The word count is really just to gauge your own personal progress.

Would it be cool to have a seperate unique word count? Maybe, I personally don’t care. In any case, I feel this has been brought up before for several languages and it doesn’t seem something LingQ is interested in ‘fixing’, although it’s basically working as intended already.

1 Like

I face a similar “problem” with German. As @james_patrick has mentioned, I don’t take it literally, it’s just an estimate value. Despite the fact german has not as many cases as Russian, sometimes I do find myself thinking if that’s just another form/tense of a word I’ve seen before. What I try to do is to only save the word once, and whenever I encounter it again in a different tense or declension, I “exclude” it. That way I can get a closer number to the real amount of words I know.

One thing that makes it even worse in German are the words with separable prefixes at the end: you may know the the meaning of the root word, but that root can mean literally 15+ different things when combined with different prefixes. It drives me crazy when I find a verb I already know but doesn’t make sense in the sentence until I keep going and see the prefix at the end that turns it into another different word.

2 Likes

The LingQ word counts are not inflated. For most languages, they record exactly what they are intended to count - unique tokens aka word types. There are some problems with how LingQ records various languages, like Chinese/Japanese, due to the word splitter algorithms, or with separable verbs in German, as @ryclassic63 mentioned, but for the majority of languages, including Russian, word types are counted correctly. The only confusion happens when you think that LingQ’s word count refers to lemmas/dictionary forms or word families. It simply does not. LingQ counts word types.

Why does LingQ count word types instead of lemmas or word families? One reason: it’s far, far easier to implement from a software perspective.

This is what it is really about. It’s really about seeing your stats increase over time to motivate you when you feel you are at the intermediate plateau and feel no progress. This is successfully achieved by counting word types.

There is another argument for using word types over lemmas or word families though and that is if you learn languages like Steve does (LingQ was, and still is, built for Steve to learn languages), you don’t study much grammar, so just because you know one word type doesn’t necessarily mean you know other forms of the same lemma or word family. Personally, this is why I actually appreciate word types being recorded, instead of lemmas or word families. For instance, in Italian, I could recognise present tense before I could recognise present subjective tense, so I appreciated those words still being highlighted.

LingQ has tried to synchronise their number of Known Words with similar levels to the CEFR (“tried to synchronise” not actually claim to be!) with the names of Beginner 2, Intermediate 1, etc. The number of Known Words for each level changes per language to account for the differences in the number of declensions, etc. between the languages. From my experience with Italian, if you use LingQ according to the method of Steve, that is, practising both reading and listening, these levels are reasonably accurate.

If you are interested in finding out how many lemmas or word families you know, find a Russian frequency list and go through it counting one word at a time, like I did for Italian in my above comparison. Alternatively, there are various tests, which estimate your vocabulary in English. I imagine you could find a few in Russian.

1 Like

Yes: there’s an argument for learning grammar! That way, when you see “работать” and recognize it as verb conjugation 1, you don’t have to memorize 6 present tense forms and 3 past tense, etc. etc.

Hello all, This question caught my attention cause I too have thought often about this. In general I agree with what @james_patrick had to say. I will add that given that it is extremely unlikely that anyone has encountered - in LingQ all the forms of a single verb or all the case declensions of a noun then one could imagine some rough calculation such as dividing the word count by 4 or 5 to get an idea of how many unique words one might “know”. This of course is in no way accurate but it might get closer to a realistic number. Also don’t forget also that just because the word isn’t part of one’s known words in LingQ that one hasn’t already encountered it somewhere else!

However there is something far more significant and worth mentioning here. I personally have had the experience (and still to some extent do) of being overtly influenced by that word count. I have found myself quickly reading through content just to break the next barrier for example and this has in no way helped me improve my Russian. LingQ is an amazing tool - in my opinion the best - but the stats and numbers are in no way an accurate reflection of one’s ability to speak and understand the language. I just crossed 17,000 known words but I am certain that only a fraction of those are truly part of my active vocabulary.

Similarly working just to maintain a streak is not necessarily the same thing as actually learning. I have discovered my own bad habits of looking too much at these numbers as a way to get a good feeling about my progress but I am sincerely trying to focus on truly making tangible progress - the stats can very well be the same for someone focusing on stats and someone really learning.

I hope this doesn’t come across as harsh or critical. I have spent the last 2 years working every day on Russian but the thing I feel like I really learned is how damn hard language learning can be but how incredibly rewarding it is when one experiences for oneself real, tangible progress in speaking and understanding a new language. My advice is dive in and really learn and let the stats be what they are. Having meaningful exchange in Russian, or any language will be infinitely more rewarding than whatever rewards one gets from seeing the word count grow. I wish everyone all the best on their language learning journey!

1 Like

That is true in general of LingQ, as I’m sure other commenters have pointed out (I haven’t read through all the comments, though). Every language is essentially unique on this. Yes, there is the issue of different endings, which can vary widely between languages. But even beyond that, some languages use a much wider array of words, while others use the same words over and over with sightly different meanings. A few examples from my own observation:

Consider Chinese:
It’s not even fully clear what is or isn’t a word. You can count the individual characters with their individual meanings, but the vast majority of words consist of two or more characters in combination. (And often enough, the same meaning is expressed sometimes by one and sometimes by several characters depending on the linguistic context.) But there are many borderline cases, where you could view something as two words, or as a single word. And then there are “idioms” which could be considered either a phrase, or a long word. And on top of all of that, there is LingQ’s own software, which has to decide whether to highlight something as one or several words, and often enough gets it wrong (there are quite a few threads complaining about this on the forum :sweat_smile: ). On the other hand, Chinese doesn’t really change around the form of words at all in the way we’re familiar with from other languages.

Or consider Persian:
Persian expresses many things as compound verbs, i.e. a noun/adjective plus an auxiliary verb. So you have a ton of very common expressions that go: “do [thing]”, “take [thing]”, “give [thing]”, “hit [thing]”, etc. So, for instance, “to learn” is commonly expressed as: “to take memory”. “To shout” is “to hit a scream”, and “to forget” is “to make forgotten”. You end up with a ton of verbal phrases, but only relatively few verbs. On top of that, Persian words often have separate parts which LingQ’s software highlights and counts as separate words. And on top of that, Persian words can have short vowels indicated, or omitted (usually they are omitted). You guessed it: they count as separate words on LingQ.

So in conclusion, there’s little to no way to compare word count between languages. They are just too varied for that. The only real use for word count is within one language, to compare where you are now vs. where you were.

1 Like

This is a very good point. I, like Steve and you, appreciate that I get to mark every variation of the same lemma separately.

2 Likes

I almost never look at the word count until Lingq calls it to my attention, but the analyst in me would loave yet another metric just for kicks.

I 100% agree with nfera and have said it myself. Learning a language is about learning lemmas and not word families. Often people think they will know every lemma if they know the word family plus a little grammar but that’s not practical. Total words read is a better metric to gauge level compared to known word count assuming that you aren’t just reading easy stuff all the time. For more difficult texts, I think a person should learn 1 new word per 25 words read. For easier texts it’s closer to 1 new word per 100 words read. If you read both texts at the same wpm then obviously the harder text gets you 4x faster wpm, but in reality that harder text will take at least 2x-3x more time. So in the end I think that is evidence that reading easier texts is better for learning. Not to mention the other factors: easier text is better for comprehension, better for solidifying already “known” words, more enjoyable, etc.

2 Likes