When using the LingQ reader for Cantonese, many words are missing romanizations, or have the wrong romanization (using Jyutping here, since that is the standard).
This is from the 1A introductory “Mini Story”. Above, you can see that the name 阿文 aa3 man4 has no romanization listed, nor does 一間 jat1 gaan1 . In these cases, it seems like the missing romanization is due to the characters being grouped together - a simple solution would be to just pull the romanization for each character separately (not always accurate, but at least it’s there).
Here’s another example from the same lesson, where the romanization is wrong:
佢哋 keoi5 dei6 is incorrectly listed as “keoi5 dei2”.
This is an issue with pretty much EVERY lesson on the platform. As a result, it is very hard to recommend LingQ as a resource to my beginner friends. Is there anything we can do to improve these readings?
I would say the “normal” or expected way to use LingQ with Chinese languages is to save the correct transliteration when you create a LingQ. Either by selecting an existing hint from the list or copying it from a dictionary you trust. This is how it used to work, at one point LingQ added some algorithm to guess the transliteration but as you can see it’s flawed. Imho it’s unlikely to ever be good enough, so I suggest you ignore it and disable transliteration in the settings (there are three different settings). Additionally, Cantonese is one of the smallest languages on LingQ with only a handful of users, which in turn means that it’s the least likely to receive developer attention.
It will never be perfect, yes, but why is it missing readings for very basic words? It’s not just names like “阿文” and “阿玲”; very basic phrases like “可唔可以 ho2 m4 ho2 ji5” and “好快 hou2 faai3”, and “好開心 hou2 hoi1 sam1” are also missing readings.
I don’t think this issue is due to the technical difficulty of obtaining readings - since the system gets more obscure readings right, it seems like it is due to a bug or an oversight. I expect it could be fixed by just parsing the word as separate characters if it doesn’t have a reading.
The 佢哋 issue I mentioned above is a weird outlier, maybe it’s thinking of 哋 as being used in 麻麻地 maa4 maa2 dei2 or something. I feel like this is common enough that it deserves a direct override. Some other examples that I see from the first few beginner lessons are “時間” si4 gaan1 (should be si4 gaan3), “廚師” ceoi4 si1 (should be cyu4 si1).
These are just outliers - overall, most of the readings currently given are acceptable. The primary issue is the missing readings, which I was hoping could be resolved with minimal effort. I know parsers are hard and I appreciate all the work being done here. I just want to be able to recommend LingQ to my friends who are starting to learn Cantonese.