Word counting

hi Administrator,

Would you improve the method of counting known words by itself in stead of its different forms?

For example: LingQ recognizes “student” and “students” as two different words. “study” and “studies” or “studied” are 3 different words. As a result my statistics shows that I know 5 words, but actually I know only 2 words “student” and “study”.

I understand that it’s a little bit more difficult to analyse the word than just compare its strings but statistics of LingQ would be more precise if you can fix this.

Thank you and waiting for the improvement.

We have chosen to count the words this way and this will not change. Each form of the word represents a different function of the word in a context. I personally find that I save many different forms of he same word, often because I don’t remember the meaning of the word in any form. When I review these words in the Vocabulary section they usually appear together, and I find it really helpful for remembering them.

You can include the basic word in the Hint or you can Tag the word with the basic term if you find that useful.

Hi there,

We count each form of a word as a different word on LingQ. If you like, you can use the ignore feature to exclude words from your Known Words total. This can be done by either clicking on the word and clicking the small “x” that appears, or by hovering over the word and clicking the red “x” button that appears on the popup.

I agreed that different forms of words help us remember it faster. But here I mean the vocabulary statistics.

In addition, different forms of words like “intention”, “intentional”, “intentionally” are useful, and we can say that they are 3 words. But in case of “student” and “students”, they are really annoying.

From point of view of programmers, It’s a little difficult to do that. But as I know, it’s absolutely viable (at least for English).

Hi there, personally I don’t see the point of doing that, as long as the word count serves as a way to measure your progress it’s ok. Even if the system were set up the way you say, it could not give you a real number of the words you know, because you will always be forgetting, learning and relearning words. It would be a very rough estimation anyway, since human brain is not a computer.
The way I see it, the “known words” is just a number to compare yourself to yourself - in the past. I don’t think it is supposed to be an accurate number of the words you really know, or a way to compare yourself to others. But that’s just the way I see it anyway.

Like Alex says, if you feel you already know the word in a different form and only want to count it as one words, just use the red “x” to keep it out of your statistics. Easy.

In my view, tt does not matter how the words are counted. The statistics are only a rough indication of your progress. If I say I know xxthousand words, it does not really mean much. How well do I know them? How many have I forgotten? How many shades of meaning do I know? Can I use them? Do I use them?

What is meaningful, however, is the percentage new words when you select a new lesson. This indicates the difficulty. As the general percent of new words declines over time, you will find that you understand more and more. So the growth of your known words total is an indication of progress.

It is up to the learner to decide which form of the word is a new word worth counting. If we take

use, misuse, abuse, useful, useless, usual, usually.

which of these would you like to save? Just one, or more of them?

We cannot make a rule just for the plural in English. We have 15 languages, and the situation is different in each language.

We may one day create or find a converter for those who would like to know how many “word families” they know. However this may not be available for all languages, and the definition of what belongs in a “word family” is necessarily arbitrary.

For anyone interested in the issue of words and how many we need to know and how to count them, the following is something I found via google.
http://bit.ly/o0Hk9o

To me this is unnecessary complicated. What matters most is our interest in what we are reading. This will drive us to learn, and LingQ can help us get there.

I see it like points in a computer game. Some people always seem interested in getting a high score, which is actually a good thing to motivate us at times, but the essence is in the playing. That’s where the fun is had.

Yes, the number is like a point in computer game. And it’s important for us because it indicates our progress. Absolutely agree. We really don’t need to know exactly how many words we know to say that: “oh, I know 10,000 words” while we cannot speak the language fluently. But for those who are preparing for TOEFL or that kind of exams, the number of knows words can help them decide whether to take the exam or to improve more.

Of course, doing that is really difficult for all languages. Every language requires separate research and there is no common rule we can set for all. (I know it’s extremely complicated in Russian). And the example I gave didn’t mean the plural in English, it’s just example.

Anyway, I’m interested in NLP. Improving that way in English is one of challenges. I’m thinking about that.

@Steve: you said “What is meaningful, however, is the percentage new words when you select a new lesson. This indicates the difficulty.”

Would it be possible to implement a little change in the Import Bookmarklet so the percentage of new words is shown? You can always get a general idea just by looking at how much blue there is, but maybe that’s not difficult to program and it would be nice to have a number.

Diego, I do not see us changing how the Bookmarklet works at this time. There are so many other things we need to do. You can, however, see the % new words when you see the newly imported lesson on the My Lessons page.

Nice! It’s been ages since I don’t go to My Lessons page so I didn’t notice the percentages have been there all the time. That will do, thanks.

In English, an -s at the end of a noun means that it’s plural. After a verb, it means that it’s conjugated in the 3rd person singular. The computer doesn’t know if WE know that the plural -s belongs the to noun category or that the 3rd person singular -s belongs to the verb category (or if we understand the difference between -s and “no s” at all). Hence, it’s fairly reasonable to treat them as individual words. For English, this -s is pretty basic knowledge, though. However, in other languages, a certain letter can mean a different case or number, or any number of things (for a beginner, articles and endings in German can seem ungraspable).

So, similar forms can in fact mean something else, and visually different forms can actually belong to the same word family (am/is/are - anyone?).

Do I really “know” the word if I can’t recognize (or know when to use) the various forms? I say no.

@jeff_lindqvist: That is misconception. You wrote: “Do I really “know” the word if I can’t recognize (or know when to use) the various forms? I say no”. Yes off course. And according to this, if you know the word “student”, (your known words increased by 1) but you feel confused with the word “students”. That mean you don’t really know the word mentioned. Then you keep studying, improving and finally understand “students”. The point here is your known words will increase by 1 again (total 2) or just remain the same? I want it to remain the same. Lingq should count when a new word added to known words (even though words from 1 family) but not different form of one word.

@Steve: I suppose that: First step is to count the number of known words by the way LingQ’s doing. Second step is to reduce the number by comparing the “set of strings” with Word Net ( or a huge dictionary for other language (not English)) and counting only infinitive form of words. In vocabulary section of LingQ, we can keep all form of 1 word and learner has to learn all of them. The point here is not to give a fake number of knowledge, so we can use the statistics as a reliable argument to measure ourselves.

Steve, how do you think about this approach (it doesn’t require a lot of programming effors) ?

As I mentioned above, if you are concerned with the accuracy of your word count, then simply use the Ignore feature to remove these other forms of the word.
We have no plans to spend any additional time on this.

@lovethule, if you want a tool where accurately monitoring word families is integrated within the system, LingQ is not that tool. It doesn’t do word families. There are arguments for and against using word families vs. actual words known. As a language learning tool, it really does not matter. For me, the number of words known is completely arbitrary; its power is being able to see the initial sea of blue gradually decrease over time, proving your progress in the language.

From your posts you seem to be fairly technically minded. If counting word families is important to you, why not export your entire known word list to Excel (I think you can do this) and write some VBA to analyse the words to give you the figures you’re after ?

…or a better idea even (if you’re into programming) would be to write some software to interface with LingQ’s API, and share it with other members, some of whom would I’m sure be interested in this.

I just wrote a blog post on the subject of vocabulary and word counts etc. Please have a look.

http://bit.ly/nEeHHV

Paul Nation and Batia Laufer are specialists in this field. They reckon that the number of individual words needs to be divided by 1.6 to arrive at the number of word families, in English. I also read that to do well in TOEFL or TOEIC you need over 8,000 word families which is going to be 13,000 words the way LingQ counts, so I would say at 15,000 you should be comfortable.

If you want to test your vocabulary level try out this website.
http://www.er.uqam.ca/nobel/r21270/levels/

Lovethule - I’m not saying it’s the best tool, but since there are situations when the average learner perhaps doesn’t realize that the string of letters is another form of an already known (or at least saved) word, due to position in the sentence, capitalization and so on, it’s better than nothing. The number of words in a word family is arbitrary (see Steve’s example above with use, misuse, abuse etc.) and since cat, cats, cat’s and cats’ work differently, it actually makes sense to treat them as separate words/(“word usages”?).