Custom Slot for Historic, Niche and Constructed Languages. How might this work?

I have seen other posts on this forum asking about how to learn Ancient Greek using LingQ, since there is no slot for it and it would be confusing to mix ancient and modern Greek. There are also a litany of other languages (historic, constructed and niche) which will certainly never get a dedicated languages slot on LingQ (e.g. Klingon, Cornish, Low German, Elvish).

With modern AI, I am wondering if it would be possible for LingQ to offer a more bare-bones, custom language slot to users which would essentially allow you to pre-programme an AI, which assists in translation/LingQ creation, to any language you want: perhaps you could enter the language, a more detailed prompt/description of what the language is and perhaps some sort of basic reference like a dictionary or a file containing grammatical rules. Some functions on LingQ might have to be switched off, like sharing created LingQs between users, social functions and auto-translate, but the core framework and activity tracking of LingQ would remain in a limited form for any language of your choosing.

It is worth mentioning that the benefits would probably be very limited for dead, pre-modern languages with very little available media or which never had formalised spelling/grammar rules. Nevertheless I think this could create a new niche appeal for LingQ, granted expectations were realistic.

This is not a serious feedback/suggestion post. I am simply intrigued at the idea and wonder which languages you would be interested in learning if this feature existed. I would probably use it to learn (the Mercian dialect of) Middle English.

4 Likes

The way AI has been created, it doesn’t work well at all on minority languages and there is no real way to teach it so that it functions properly. This includes languages for which there really are quite a lot of resources if you gather everything up, such as Inuktitut/Greenlandic. What happens when you try is the following

  1. For texts it can find bilingually online, AI will translate it very well… from whatever the second bilingual text is. For example if a news article is in both Greenlandic and Danish, AI will “skip” over the Greenlandic version, translate to English from the matching Danish version, then claim it translated from the Greenlandic instead. It doesn’t really learn from this either, so it can’t “deduce” a new sentence based on something it previously had a translation for.

  2. With minority languages, AI completely lies (“hallucinates”) about a word, morpheme or sentence meaning. Every so often it will get something right but then you ask it the same thing later and it comes up with something completely different which is wrong.

  3. Even if you do your best to give it enough data to work with, AI eventually forgets all your instructions regarding the meaning of words and how words work, or it doesn’t seem to access even half the data you gave it.

LingQ has AI translation for Esperanto, which is a language with more speakers than Icelandic (LingQ also has Icelandic). But its AI Esperanto translations are frequently messed up despite that there are more lessons and dictionaries out there, in various different languages, for Esperanto than there are for Icelandic by a wide margin.

Being able to easily train AI to learn, and help you learn, a minority language is a big dream of mine, but commercial AI just isn’t there yet. Maybe in a few years.

Thank you for your reply!

I mostly agree. I think AI has a lot of potential: intuitively, if an AI is given a long list of grammatical rules and a dictionary then it should eventually be able to provide some kind of use for a niche language. The risk of hallucination, however, is too great: for a constructed/fictional language, like the Black Speech from LOTR, I am sure that the AI would start inventing/making up/improvising new words to fill gaps and fulfil the command it was given.

An AI would also struggle to respect the confines of a specific language or dialect if there was a very similar dominant, adjacent language. For example, if you wanted an AI to teach you Low German or Franconian it would likely start morphing standard German words into the dialect.

I think AI also requires a massive amount of authentic content in the language, which simply doesn’t exist for many niche languages.

Overall, I agree that a comprehensive AI tool isn’t really a viable for niche languages. Even if there was no risk of hallucination, I think most niche languages lack the scale of media and massive amounts of training that are needed to construct such a model in the first place. However, I think some very basic AI functions would be possible and that the overall LingQ framework could be useful for learning a niche language, at least for the sake of tracking input and perhaps output activities in the target language.

For a very niche languages like Elvish, Cornish, Toki Pona, languages with >50,000 native speakers, I don’t think we be aiming for an AI model that extends beyond output such as:

“the sentence you have written appears to violate this specific grammatical rule”

“the word for ‘apple’ in language xyz is…”

“here is an example of this word being used from this book:…”

Very simple, straightforward commands, like checking if a grammatical rule from a pre-defined list has been violated, referring to a dictionary (not independent translation) and searching a media library for the use of a certain word (not making up or improvising an example).

Fingers crossed, but I think large language models are fundamentally flawed in some ways and shouldn’t be compared with a truly all-knowing intelligent system that understands everything it’s doing.