The other day I looked at my vocabulary for Korean, and I noticed that all of the words were showing as ‘rare’ words, in terms of the number of stars assigned to them. Is this a known issue? Is there something that makes this particularly difficult to fix for Korean?
Since Korean is an agglutinative language, it is near impossible for us to tell whether a word is common or rare, as each word can take literally hundreds of different forms (due to the accompanying grammar patterns).
For now this function won’t work with Korean. Perhaps we ought to even remove it for Korean, so as to not mislead users into thinking every word is rare…
I believe this is because we have not yet run the script that calculates frequency in our new languages. We have been waiting to accumulate enough content. The agglutinative nature of Korean, just like the inflected nature of certain languages, will affect the results, but will not cause all word to appear to be rare. We may get at this in the near future.
The importance has now been updated for all terms in the system. Please note that LingQs you have already created will still have the old importance, so simply visit the Vocabulary page and click “Update Word Importance” on the right-hand side to update the importance for your LingQs.
The agglutinative nature of some languages (such as Turkish and Hungarian which are likely to come to LingQ within the near future) is going to be a challenge for the site.
Synthetic languages would simply be impossible under the current system. Take Inuktitut, for example, in which a single word, on average, may comprise upwards of 5 morphemes. “In one large Inuktitut corpus - the Nunavut Hansard - 92% of all words appear only once, in contrast to a small percentage in most English corpora of similar size.” Also, the same morphemes may look very different in words because there is an internal sandhi process which makes things like the pronoun I ‘junga’ turn to 'tunga
Although this last language will probably not make it to LingQ any time soon, is does provide a difficult question.
The only way to solve this question, is to have a computational linguist develop a system, which will provide a way of distinguishing such structures. It’s tricky, but there’s no real other way to become more accurate in this way. My thought is that the user could use this tool when they click on a word. Maybe an analyse button would appear or something. At the end of the day, creating the system would be more difficult than the implementation of it. But, that’s the job of linguists.
The approach which this site seems to be taking, is the ‘flat’ approach. This works fine for languages with a fusional and isolating morphology, but I don’t think it will work well for others.
I hope this provides some food for thought.