Lingq thinks there are way too many words in Hebrew thanks to prefixes

istrauss6 · February 22, 2024, 3:38am

I’ve noticed that whenever a Hebrew word has a prefix like ב or ה or ל, LingQ mistakenly thinks it’s a totally new word. This makes the “new word count” and “known word count” completely off. I really like the idea of knowing how many words I know in Hebrew, so it’s disappointing that this doesn’t word, and my numbers get crazily overflated. Is there a way to fix this? Thanks.

WillowMeDown · February 22, 2024, 3:58am

Actually it’s the same for other languages too. “Not a bug, but a feature.”

In some languages, words with multiple forms might not be as recognizable as just prefixing the word with a letter. For example, in English, the verb “to be” takes many forms: am, is, are, was, were, being, been,… It’s really not intuitive at all. Rather than automatically marking all forms of a familiar verb as “known,” LingQ allows me to mark some forms as known and others as unknown. So that’s the logic behind it.

As long as you can see that your word count is going up, something good is probably happening.

Hope that helps.

istrauss6 · February 22, 2024, 5:20am

Thanks for the reply, but you don’t understand: prefixes work differently in Hebrew. It’s not just a different form of the verb (Hebrew has those too, that’s not what I’m talking about). Let me explain:

The Hebrew word for “the” is one of these prefixes: “ה”. That means every time you need to use the word “the,” you have to stick “ה” onto the next word. So nearly every noun in the Hebrew language gets “ה” and these other prefixes stuck on it at some point. That means LingQ counts “the chair,” “the table,” “the person,” and “the sky” all as distinct words from “chair,” “table,” “person,” and “sky.”

The other Hebrew prefixes are the words for “to,” “in,” “from,” and other very, very common words. So when you count these all as separate words, you multiply the Hebrew language, like, five times over. The “new word estimates” are all completely wrong as a result.

applegone · February 22, 2024, 5:48am

“Known” words are based on words, not terms. So, if you split up the “word” into two words in the “Edit Sentence” mode, and thus, making into two words, then you should prevent what you’re experiencing.

In “Read Sentence” mode, you can still combine the words as “lingq phrases/terms” so that you can track those terms, and these new lingqs won’t affect your overall “known words”

As you can imagine, the side effect of this step is that you’ll spend a lot of time editing sentences. You’ll also find that some (if not all) of the LingQ lessons can’t be modify. I don’t know of any other method to easily correct it. You could use other tools and reimport also, but that could be equally painful / slow.

nfera · February 22, 2024, 6:09am

In Italian, the following are separate word variants:

amica (female friend)
l’amica (the female friend)
dell’amica (of the female friend)
all’amica (to the female friend)
dall’amica (from the female friend)
etc.

This is not an issue. LingQ records word variants, not head words nor word families. The statistics are designed for motivation. You are just meant to see an increasing number over the months and years.

If you wish to know how many word families or head words you know, either go through a frequency list and mark the words manually or use one of the vocabulary estimators.

I’d say this is more a side benefit. The real reason would be because it’s easier to implement from a software perspective.

Jessei · February 22, 2024, 7:53am

Purpose of the known words isn’t to count words rather than track your familiarity with the language. Problem with your example is that were would you put the limit? For example Spanish can have reflexive verbs that are conjucation of other verbs. I suppose this applies to every language. At some point you will have a exception to the rule and the word you would like to exclude means also something else that by your definition should be included. It’s easier just to count all and judge progress based on that. That’s why lingqs known word targets are higher than they would be if you counted just unconjucated forms.

Pr0metheus · February 22, 2024, 9:31am

LingQ classes word forms as “words”. My advice: don’t worry about it. As long as you’re learning, the number of words you’ve accumulated doesn’t matter.

waka_suke · February 22, 2024, 10:43am

では、なんでマイナスになるのですか？

では、なんでマイナスになるのでしょう？？？

Pr0metheus · February 22, 2024, 11:11am

LingQ が合計単語数を増やすことで不正行為を行っていると信じている人もいます。このような人は、学習した単語の数を重視しすぎます。しかし、合計数は重要ではありません。

procion · February 22, 2024, 11:13am

As people said that’s just how Lingq works. It’s partly because it’s easier to implement it as an algorithm.

It’s completely the same with Arabic, which also stick articles. And in Korean, we have nouns with sticked particles recognised as separate forms.
It’s even worse in Turkish which is highly agglutinative language, so it have an infinite number of word forms.

I agree, that it may have sense to “merge” some easily recognisable forms into one lemma, but that way we will have to decide which word forms are “evident” and which are not. When you get better with the language, you also get better recognising different forms and intentional misspellings.

Word count means nothing. Just the more the better. Also, Lingq somewhat takes these considerations into account, when counting the needed known words count for getting the next level. In a language with many forms, you need more known words to reach the same level.

There’re some apps who take the different approach, trying to lemmatise word forms. You may like that more. While this reduces the number of clicks and gives more precise word count, it also has its own disadvantages.

waka_suke · February 22, 2024, 11:44am

合計を気にしないから、私はマイナスになっても関係なく覚えてなかったり、新しい意味が出てきた単語を1に戻します。結局、コインを稼がなければストリークも続きません。なので、毎日学習していても、同じところを続ける私はコインを稼ぐこともストリークを続けることもできません。そこに意味がないのであれば、「ストリークがなくなりました」みたいな煽るメールもいらないのではないでしょうかね？自由に使いたいだけなのに🥺

WillowMeDown · February 22, 2024, 11:51am

Oh, I didn’t know that about Hebrew. You’re right, what you’re describing is different from verb conjugations.
Something very similar happens in Korean, though, at the ends of verbs. There are so many different things that can be tacked onto the ends of verbs. It’s never-ending.

Pr0metheus · February 22, 2024, 12:35pm

ご自由にお使いいただけます。通知、報酬、連続記録は無視してください。

LingQ の背後にある考え方は、人々は読み続ける、つまり読めば読むほどより多くのことを学べるということです。したがって、連続記録は読書のみに依存します。語彙機能を使用すると、すでに読んだ単語を練習できます。

WillowMeDown · February 22, 2024, 11:05pm

I just thought of an idea. Maybe you can export vocabulary to Excel and then alphabetize it and eliminate all words starting with the, in, etc and you’ll be left with a more accurate vocab count. =number of lines in table.

It’s still probably not exhaustive because if you’re like most people you probably know a number of words that you haven’t encountered yet on LingQ.

miriaml5 · February 23, 2024, 2:46am

I agree with the people who say don’t worry about it! Many languages have something like this, though not quite to the same extent. For example, like Hebrew, Spanish can stick a pronoun onto the end of a word, and that will count as a different word. In French, of (d’) is also a common prefix stuck to the beginning of the word, and Lingq will count this as a different word. It’s fine! That’s why Lingq word counts is only relative. If they are getting higher, that means you are using the language. That’s why, according to some people, advanced is really like 70,000 words in Lingq. The new numbers are better at estimating, however.

istrauss6 · December 17, 2024, 7:16pm

This is a really good idea, thanks!

Todd-Hebrew · December 18, 2024, 2:23am

I’m studying Hebrew, as well. I kind of like that LingQ counts everything as separate because I don’t always recognize them as words that I know (obviously, I’m talking about the pronoun endings and the prepositional and clause “prefixes”). I’m not too worried about the number of words that I know on LingQ because I have that list on Duolingo, which I started in January. But that word count is sort of meaningless I discovered because I can’t understand what I am hearing and not always recognizing words that I studied in Duolingo in real contexts. I think–as someone who is still not understanding what I am reading and listening to–word counts don’t matter as much as what I can do with the words that I can understand.

With that in mind, I was watching an interview on Robin MacPherson’s YouTube channel, and the gentleman that he interviewed had this great idea of measuring your growth through reading, which would be perfect to use with LingQ. (It was video 3 of the series at 11:50: https://www.youtube.com/watch?v=VfS06FlclN0&list=PLuIhQdwUVqqryPfMtqNyc_-gI1C15S-Mf&index=3.) I think this is the way that I will measure my growth. In a way, this is what I do with my own students in English using educational apps.

The point that you are making, though, is a good point. I hadn’t thought about whether words should be counted as the same or different in Hebrew based on “prefixes” or “suffixes,” to express it in English terms. I’m curious now how native Hebrew speakers think about it.