I have almost 23000 Russian LingQs but every time I create a lesson, I see a lot of blue words. It is kind of frustrating. So I wonder how many distinct words are there in the entire Russian language.
Here is my attempt to find out the answer.
I remember back in high school, I learned of a method to estimate the number of fish in a pond. You capture some fish (sample size = s1), mark them and release them. Then you capture some fish again (sample size = s2) and count the number of marked fish (m). The total number of fish in the pond would be:
(s1 x s2) / m
So I tried to use this method to estimate the number of distinct words in Russian. I selected a random Russian article of medium length in an unfamiliar topic. I created a lesson out of it, and counted the number of new words vs LingQed + known words. Known words are not traceable but fortunately I didn’t have a lot of them.
I repeated this 2 more times, and here are the results:
Current total LingQed words: 22966
Article 1:
New words: 191
LingQed words = 303
(191+303) x 22966 / 303 = 37443
Article 2:
New words: 238
LingQed words = 432
(238+432) x 22966 / 432 = 35619
Article 3:
New words: 227
LingQed words = 392
(227+392) x 22966 / 392 = 36265
So I estimated that the total number of distinct words in the Russian language is about 36442.
Any comment on my methodology?
I also wonder if anyone else wants to do a similar exercise to confirm this estimate, or to do estimates on other languages.