Feature suggestion - Morphological Analysis

hillel · July 7, 2008, 10:47pm

Hi LingQ team,

This suggestion is for Russian, but might be a good idea for other languages as well. The basic idea is to index all of the single words in your transcripts by base form instead of by the inflected (surface) form (you’ll need a Russian stemmer, I don’t know if there is such a thing avaliable for free though).

The scenario goes like this: lets say I have 1000 words in my vocabulary and I now open a new transcript, LingQ will get the base forms for my 1000 words, and then match them against the indexed base forms for the new document and mark all the matches.

In a language with rich morphology like Russian, you currently don’t mark many of the words I’m working on yellow, because they appear in the text with minor morphological changes, this feature is designed to solve this problem. Also, it will yield better approximation of the ‘new words’ count for a transcript.

steve · July 7, 2008, 11:07pm

This is a subject that comes up regularly. I am studying Russian. I treat my known words number as an indicator of progress, not as an absolute. I treat each form of a word as a new word, in terms of how it is used. The fact that I may save many forms of the same word, actually helps me eventually learn it.

Nevertheless the whole question of how we deal with words, word forms, and word families will be looked at in the future, when we get a lot of other issues settled.

That said, I greatly appreciate your input and active participation. We can’ always respond right away to all ideas.

hillel · July 8, 2008, 6:30am

Thanks for the fast reply!

typikon · July 21, 2010, 2:28am

I really agree with this suggestion. I would use this resource but when I open up something new, half the words are marked as new, when in fact they’re not. If a good Russian vocabulary has 15 to 20000 words, and each russian word has between 12 (nouns) and 20 forms, we’re talking about hundreds of thousands of “clicks”. It’s too much. In addition, the vocabulary feature ends up needing heavy editing, etc.

steve · July 21, 2010, 3:34am

We have been around this subject many times. LingQ is what it is.

Each form of the word is a different term. You can LingQ or just let it go by and make it known by clicking " I know all" when you are done with the lesson. I personally tend to save more than one form of a word in Russian and I find it helps me. I have mostly learned Russian on LingQ and am satisfied. There are certainly not hundreds of thousands of clicks, nor is there any need for heavy editing.

nobody · July 21, 2010, 9:21am

I use this tool for Russian morphological analysis: http://starling.rinet.ru/cgi-bin/morphque.cgi?flags=endnnnn.

The problem is that it throws in apostrophes to show stress, which you then have to take out by hand if you are copying and pasting into a LingQ lesson. If anyone knows of a conjugator that just returns plain text I could be interested.

I generally LingQ and learn words in 3 forms.
Verb: imperative, infinitive and masc past participle
Adjective: masculine, feminine and neuter nominative singular
Noun: nominative singular, nominative plural and genitive plural.

Not for any particularly scientific reason, I just like things that come in threes.

Greg_Morris · October 11, 2014, 3:42pm

Perhaps a more modest suggestion along the same lines. A place to store the dictionary/lexical form and a place to store parsing information. An easy way to copy previously entered hints for the same lexical form. Perhaps this could be implemented so that once a user has identified the lexical form it would automatically pull up any previously entered hints and notes. Also a method to allow a user to edit all hints for a give lexical form would be helpful.

nobody · October 13, 2014, 4:40am

Yup, this is something we were discussing earlier last week We’re working on some other stuff at the moment, but we do hope to tackle this issue in the near future, and your suggestions above should come into hand when we are ready to take this one on.