Frequency Lists

Axel_V · November 18, 2012, 1:36am

Hello LingQ-community!

I am relatively new here and my question is whether anyone here has ever discussed introducing frequency lists to the LingQ system? You would then see how hard the language of the article is before reading it and how popular the words are that you are about to learn and do many other great things as well.

I am now not going into technical details, just throwing out a feeler to see whether this had ever been discussed here before.

Thanks in advance,
Alex

steve · November 18, 2012, 1:50am

Hello Alex,

We do count the frequency of words in our library. You will notice that there are little stars beside each word that you save, or that appear in your lists on the Vocabulary page. **** is high frequency, *** less high, ** less and so on. I am going from memory but I think we used thousands, i.e. most frequent 1000, second thousand etc… I also notice that this is not working in some languages, something we need to add to our list of things to look at.

As to the difficulty of the lesson, in addition to whatever level the provider ascribed to the lesson, the number of new words, and the % of new words is a useful indicator of the difficulty level for each user. After a few weeks with a language at LingQ, most of the new words will be low frequency, I would imagine.

Cheers.

Axel_V · November 18, 2012, 2:15am

Hello Steve,

thank you for your quick reply!

The program is great and it does help a lot the way it works now. However, what I mean is that building in some software like AntWordProfiler or a similar tool would let you measure your progress more accurately. To work with this properly you would certainly need to make the program recognize declensions/plural forms as one word or even recognize words and group them into word families. But the advantages would be impressive. As we know, it is considered that you need to know 7.000-9.000 of the most frequent word families to understand 98% of the text in most languages to enable you read most texts in the original. If LingQ uses frequency lists and groups them correctly, you would get accurate statistics on how efficiently you are approaching these aims. Statistics would say, for example, “you know 998 words of the first 1000, 850 words of the sixth 1000 etc” or “this text brings you 24 words of the 5th thousand you do not know yet”. These are just a few of the options of using this tool.

This way, you can easliy and efficiently work your way up to 5000, 7000, 10000 or 15000 of the most frequently used word families in a language. I consider this would work as an efficient machine of producing hundreds of polyglots as you would be able to track your success efficiently and approach your goals guided by statistics.

Do you find this makes sense? I could eagerly explain you my vision in more detail if you are interested.

Cheers,
Alex

steve · November 18, 2012, 2:25am

I understand what you are saying Alex. However, it is unlikely that in the near future we would introduce this kind of sophistication. We are more concerned with how to help people understand our site, how to use it, and why it helps them learn. I do not think that what you propose would help us in that regard.

I must say that in learning Russian, Czech, and Portuguese here, and improving in other languages, I have never felt the need for this kind of detail. I can see, however, that some people would like this. But I don’t see this as a mainstream issue.

Axel_V · November 18, 2012, 3:03am

Thank you for the reply, Steve. I must be the first person to come up with this idea here.

I see your point and I probably agree that at first sight for non-linguists/people aiming at professional knowledge this tool would not bring much and at first sight it would not help market the product any better.

It would be still interesting to hear from other users. It might be that someone finds my idea interesting. For people new to the subject I want to say that free programs already exist allowing to analyze any given text according to its vocabulary based on word frequency lists. Accurate frequency lists of about 20000+ most frequent word families are available for several major languages and may be obtained for other languages as well.

If a tool like LingQ incorporated this software and used its possibilities, it would let you work up your way to knowing a specific number of word families to meet your goals (i.e. 7000-9000 word families or 98% understanding of all words you meet or ability to read unabridged texts on most subjects) and make your learning and goal-setting even more transparent and fun.

I am sure this may and will be programmed in the coming years and will bring efficiency of professional language learning yet another step forward. As of now, I see that LingQ is best positioned to do this.

What do users say about it?

Regards,
Alex

spatterson · November 18, 2012, 3:29am

Alex, I don’t know much about AntWordProfiler but I love statistics.

Let me see if I understand this correctly. Let’s say “go” is in the list of the first 100 most commonly used English words. So you’re saying if you knew all forms of go (go, goes, went, will go(?)) the system would say “you know 1% of the first 100 words”? Is that correct?

If so… you COULD build out the commonly used word list with all forms of the word. Like go = [go, goes, went], be = [am, are, is, was were], etc. Then you could pull all your status level 4 LingQs with the LingQ API and determine your percentage. However, this only works with words you’ve LingQ’d and learned – I don’t think there’s a way of pulling a list of ALL known words from LingQ. If LingQ ever adds this capability to their API then this problem is simple.

Do you happen to have a reference to the most commonly used English words? German would be even better (since I’m learning German). Someone could write up a webpage that did the calculation based on your LingQs of status 4 and cross with the word list. The beauty of the API is programmers can do whatever they want with the data.

I do have another suggestion – but unfortunately this also takes a LingQ change. You can import vocab words via a CSV file. Unfortunately right now it only imports as “term,hint,phrase”. If it could also import tag as “term,hint,phrase,tag” then you could build your word list as a CSV file with a tag like “1000 most common words”. Then on the vocab page you could filter first by Tag and then by status. This could at least show you what words you have or have not learned.

With all that said, I bet if you just took all the beginner 1, beginner 2, and intermediate 1 lessons you would encounter a significant portion of the most commonly used words in your target language. The stats idea is interesting… if only to us math geeks. I bet you’ll learn more by reading a bunch of interesting content rather than trying to learn words that some statistic tells you to learn. But take all this with a grain of salt. I’m mono-lingual at the moment

steve · November 18, 2012, 3:45am

I also feel that once you move past the first few thousand words in the frequency pecking order, that is important to one person will be different from what is important to another person. In other words it depends on their interests and needs.

spatterson · November 18, 2012, 3:53am

Not to get off point but I imagine language acquisition is exponential in nature. It seems to me that once you get past the initial hurtles of the language your skill will quickly improve (at least that’s what I’m hoping will happen to me). I thought is that once you can listen to a podcast, music, a foreign film, etc and understand a decent portion of the content… you’ll start to naturally figure out the rest of the words from context. I bet I know thousands and thousands of words in English that I never made an effort to study. I learned them through reading.

I guess my point of this post is that after a while the stats become irrelevant.

But like I said, I’m mono-lingual so this is all conjecture.

steve · November 18, 2012, 4:02am

Yes, absolutely. The more words you know, the more interesting context you can understand, the more new words you can learn, and the more of the language you notice. The hardest part is the beginning. You just have to trust the process. However, confusing it all seems at the beginning, it will all seem natural in due course.

What is more, the more languages you learn, the easier it becomes to learn new ones. Just trust your brain. The brain learns, but it learns slowly.