[Request] Diacritization for Arabic vocab

alexandregallot · July 9, 2025, 7:36pm

Hello team,

One of the most annoying things with using LingQ for Arabic is that in cases where the reading of a word is contextually or lexically defined (as opposed to gramatically deduceable by parsing its PoS and deducing its reading from the regular schemes), I can only rely on the automatic TTS, which I find very annoying to use (it interrupts my workflow and sometimes I literally cannot listen to it) and regularly makes mistakes. This is especially true in cases where there are several possible readings for a given spelling.

Ex: جمل , as a noun, can be vocalized “jamal” (a camel) or “jumla” (a sentence).
شغل ,as a noun, can be vocalized “shughul” (work, occupation) or shaghl (distraction).

Basically, whenever I stumble upon a simple three letter words, I have to guess how to read it, and the TTS function might not be helpful because it can only enunciate one possiblity.

What I imagine would be nice is if when clicking on the word in a Lesson, one would get, in the case of شغل , two entries:
شَغْل - distraction
شُغُل - word, occupation

which would let the learner not have to guess the reading, and notice the ambiguity.

I suppose that the automatic AI feature used when searching vocabulary is based on a RAG-augmented LLM. Perhaps it could be tweaked to:

PoS tag the word (including stripping it of enclitics) ;
Then look for all the word’s possible readings with full or semi-diacritization;
Then provide the learner with the diacritized lexical items corresponding and their translation

istrauss6 · July 9, 2025, 8:16pm

Same issue in Hebrew for the same reason.

zoran · July 9, 2025, 9:46pm

Thanks, we will see what can be done.