Confused about Mini Stories, Lingqs and saved definitions. Vietnamese

ecolbeth · September 3, 2024, 5:33am

Hi there,
I’m new to Lingq and going through the mini stories.

I’m in sentence mode, and under each sentence is a list of words, but not word groupings. Instead of grouping words to create a definition, each word is given the same definition.

EX:
thức dậy looks to mean “wakes up or to wake up” but each word is listed individually in the list with that definition “wakes up.” so I added it as a Lingq. Which then makes sentence review much more relevant. But then I need to delete the individual words.

Am I missing something? Shouldn’t the word groupings which are contextual and provide the meaning already be saved? It creates a lot of work for the leaner who has to create the new Lingqs and then delete the single words so the sentence review makes sense. Is this typical of Mini Stories or just the way these were created?

Your feedback is greatly appreciated!

Obsttorte · September 3, 2024, 5:49am

That is exactly as it is intented. You create the LingQs for every word or word group/phrase yourself. LingQ in its essence mimics the way a lot of language learners learn languages, by reading and translating texts. It just provides the means to speed up the progress and reduces the work overhead you usually get the analog way, where you might have to look up the same word several time if encountered in different texts, whereas in LingQ the definition is stored.

It is important that you actively think about what you read and its meanings. If all the translations and phrases were already laid out in front of you, you would most likely never learn the language.

ecolbeth · September 3, 2024, 7:38am

Excellent! That’s encouraging! And I can see how it’s more engaging and impactful.

johnd2 · October 27, 2024, 1:46pm

I’m more inclined to agree with ecolbeth here. The issue, for me at least, is not so much about creating your own LingQ phrases in Vietnamese - that’s fine, albeit it’s arguable as to whether LingQ should not recognise many more commonplace 2- and 3-word Vietnamese words/phrases by default since these word groupings or clusters are such an integral part of the language.

The main issue for me is that LingQ still lists all the component words of a LingQ phrase in the not-known vocabulary list in sentence view. In Vietnamese, this typically doubles (and more) the number of items displayed in sentence view and clutters the true meaning of the sentence with many superfluous and often irrelevant item entries.

I may well be completely wrong about this, but if I had to guess, this behaviour might originate with English and other European languages using a space-delimited word as the basic item of meaning. In English we are happy to build up complex words with multiple syllables so as to provide more specific meaning to that word. But in Vietnamese it seems to be that the syllables remain largely as separate space-delimited words, but it is the cluster of 2 or 3 such monosyllabic ‘words’ taken as a whole that needs to be interpreted as the meaning of the overall noun/verb etc.

This has definitely been an impediment to my starting to learn Vietnamese with LingQ.

tgredig · November 5, 2024, 3:32am

Indeed, this is a major issue with Vietnamese in LingQ. It is definitely not working as it is supposed to. Word recognition should be similar to how it is done in Mandarin. The most common Vietnamese word has two monosyllables.

Here is the issue, you will run into:

I have reached 1300 “monosyllables” in Vietnamese, and there are almost no more “blue words” (except for proper nouns). That means I cannot know whether a new text is difficult or easy for me, since I “know” all the monosyllables, but obviously the language has many more words.

Take “tên lửa” = rocket, but tên = name and lửa = fire; so that should clearly be its own word.

So, what I do is to create a new LingQ for the words with 2 or 3 monosyllables and only progress them to maximally “4 - Learned”, but never to “Known”; that way future words are grouped together properly. However, it still does not work quite frequently, here is a problem:

“các môn vật lý, lịch sử”, should be “các môn” (subjects), “vật lý” (physics), “lịch sử” (history), but “lý lịch” = (resume, cv) has nothing to do with the sentence, also “vật” (object) is only tangential. Since I know both " vật lý" and “lịch sử” at level 4, but lý lịch only at level 1, it highlights lý lịch, even though it is separated by a comma and makes no sense in this context.

Either way, this is in an issue that cuts into the fundamentals of learning with LingQ. I hope they will update it to the level they split words in Mandarin.

johnd2 · November 5, 2024, 9:40pm

Glad to see that I am not alone in this criticism of Vietnamese on LingQ. It is definitely frustrating - I have the sense that LingQ could be so powerful an aid for me to learn Vietnamese, but I feel blocked from accessing more than say 20-30% of its potential because of LingQ’s complete focus on individual monosyllables rather than the far more helpful 2- and 3- word primary units of meaning in Tiếng Việt (actually, even recognising 2-word groupings would be a big step forwards).

I hadn’t realised that this was also an issue with other languages like Mandarin, though perhaps it is unsurprising that the problem also occurs with other SE Asian languages. It’s really interesting to hear that there is at least a partial solution with Mandarin and, yes, absolutely, wouldn’t it be great if the relevant algorithm could also be applied to Vietnamese in LingQ. It’s encouraging to learn that a solution exists, at least in principle. Presumably AI has a role to play here, given that LLM’s main party trick is next word prediction and associated word frequency lists.

Is it possible that the LingQ development team could be asked to look at this issue? Would be great to hear if the Mandarin technology can be migrated into Vietnamese and on what sort of timescale.

rwsandstrom · November 7, 2024, 12:07am

“Word segmentation” is a relevant buzzword here. Learners of Vietnamese who are also able to handle a lot of computer and math jargon may be interested in the following academic paper on “State-of-the-Art Vietnamese Word Segmentation” from the year 2016.

https://arxiv.org/pdf/1906.07662

For those new to arxiv.org, note the “Access Paper” section of the web page.

Regarding “B, I, O”, I think that these mean Beginning, Inside, and Outside (of what English speakers would call a “word”). The paper provides an example segmentation as follows:

For example, the sentence
“Megabit trên giây là đơn vị đo tốc đọ truyền dẫn dữ liệu
." (”Megabit per second is a unit to measure the network
traffic.” in English) with the word boundary result “Megabit
trên giây là đơn_vị đo tốc_độ truyền_dẫn dữ_liệu .” is encoded
as “Megabit/B trên/B giây/B là/B đơn/B vị/I đo/B tốc/B độ/I
truyền/B dẫn/I dữ/B liệu/I ./O" .

For example, “đơn vị” is probably a borrowing from the Chinese "单位 ", and thus a single word.

Ah, the joys the computer folks must be experiencing in wrestling with this!