Vietnamese Parser and other issues

G’day Comrades,

Thanks for your work fixing up the Vietnamese parser for multi-word groups, it’s much better than it was straight after the update and is now tracking progress. I’m still having one issue with it however, and have also included a range of other issues / suggestions here.

  1. First, I’m having an issue with duplicates, per images attached - the same term in the same lesson is marked at different levels, and not recognised as the same term. My guess would that the text isn’t being normalised on entry, as Vietnamese can be represented in Unicode in two ways - precomposed/NFC (letter + diacritics stored as single unit) or decomposed/NFD (letter and diacritics stored separately). To the user they’re identical but not to the computer - this is my guess as to what’s happening, because I don’t think I’ve encountered any 3 X duplicates. Unicode normalisation to NFC would solve this.

  2. Second, I’m having issues with the Edit Lesson/Course functionality. I open it fine, can make changes to the current lesson that I’m on, but clicking on any of the other lessons in the sidebar doesn’t actually do anything - per screenshot. Clicking here doesn’t change page.

  3. Third, separately, in the iOS app in Arabic, I regularly have spillover when viewing Sentence Mode - most times, the sentence has more text, but is cut off preemptively. This happens no matter what text size I select. Screenshot also attached.

Happy to provide any additional detail on any of these if required.

Examples:

Vietnamese parsing identical words:

Edit Lesson / Course: here, clicking “hát mãi khúc quân hành” for instance, doesn’t do anything.

Arabic Sentence Mode display on iOS: As per “edit sentence”, there are extra words that aren’t shown on the actual sentence mode - this happens regularly.

I’ve already posted a few updates about the new Vietnamese parser, but haven’t heard back from the LingQ team… I think there are lots of issues… It would be nice to hear from the team.

For your first point, I’ve noticed that words starting with a capital letter (e.g., the first word in a sentence) aren’t recognized properly. That might be what’s happening in your example.

It also looks like words in titles are treated differently from those in the main text (see example below):

And in some cases, the same word just isn’t recognized consistently, I’m not sure why… (see example below):

I’ve complained about Arabic script in iOS (btw, it applies to all languages with this script, like Persian and Urdu) for years. Good luck with getting their attention.

The team is clearly more interested with fashionable AI features, rather than fixing any bugs.

Thanks for your feedback everyone, we will look into this.

1 Like

G’day Zoran,

No progress seen on any of these issues yet, would appreciate an update.

@Trafs We are unable to reproduce this on our end at the moment, everything seems sorted out. Can you please provide more examples of the mentioned issue?

Hey Zoran, sure:

On the first page of my Vietnamese vocab page:

Two entries for ấm áp, with entirely different definitions and LingQ levels. This is not an isolated incident, is happening a lot. Going by how they are sorted in the list, I would say the encoding point I raised above would be the place to look.

Arabic sentence mode is also having the issue above:

You can see the sentence mode shows the sentence ending at مطلع but it goes on until الماضي.

The third issue, the edit course page, seems to be working.

Another example in a new lesson:

I have already added the LingQ “chiến lược quân sự” - military strategy. In the immediate next sentence, even after refreshing, the identical phrase is highlighted blue, not recognised as matching.

Please let me know if you need additional examples.

A final illustrative one:

From my vocab, the terms for WW1 and WW2 are clearly already present / added:

But then in a new article:

One is not even recognised, parsed into different words, and the second is parsed as a whole but shows as unrecognised / new.

Thanks, I will forward this to our team.