Word counting

From a technical point of view, word families are almost impossible to define, as it’s bound to be subjective for all the reasons given above. For the amount of work involved compared to the value it would add, it wouldn’t be cost effective. In any case, what use would such a number be? It only counts words that you’ve marked within LingQ, and would be nothing more than an idle curiosity.

@Steve: Thank you very much for useful resources. That’s what I need right now.

NAtion has come up with a definition of word families which is widely used in research (in Bauer and NAtion 1993, level 6 affixes). The thing is, that’s only applicable for the English language, and the LingQ algorithms need to be language-independent. As nobody has yet come up with a lemmatisation program which works for all languages, counting word types (all variants of a word) is the best way for LingQ to do it. Then you can apply a rule-of-thumb conversion factor which will be different for each language.

I think, by the way, that the 1.6 factor for English converts the number of word types (individual words) into lemmata (headwords, standard dictionary entries). Word families generally contain more than one headword, so the conversion factor from individual words to word families would be bigger.

Here are the first few entries from Nation’s baseword list 1:

A 0
… AN 0
ABLE 0
… ABILITY 0
… ABLER 0
… ABLEST 0
… ABLY 0
… ABILITIES 0
… UNABLE 0
… INABILITY 0
ABOUT 0
ABSOLUTE 0
… ABSOLUTELY 0
… ABSOLUTIST 0
… ABSOLUTISTS 0
ACCEPT 0
… ACCEPTABILITY 0
… ACCEPTABLE 0
… ACCEPTABLY 0
… UNACCEPTABLE 0
… ACCEPTANCE 0
… ACCEPTED 0
… ACCEPTING 0
… ACCEPTS 0
… UNACCEPTABLY 0
ACCOUNT 0
… ACCOUNTED 0
… ACCOUNTING 0
… ACCOUNTS 0

Work has been done on word families in other languages (I found Dutch on Google), but I don’t think we could add all this into the LingQ software, especially not if we keep adding a new language a month.
6 word families account for 29 different word types.

@Steve: Could you please adjust the way LingQ chooses “LingQ of the day” to learner in order to not send those words of status 4. Maybe you can set word of status 3,2 with lower priority of choosing than words with status 1. Hopefully it’s possible. Thank you.