STOP sharing lessons with AI generated texts (transcripts) in Icelandic!

Anyone who is using whisper or some kind of an AI to automatically generate transcriptions (texts) for Icelandic material, whether it’s from youtube videos or mp3 files should not share the lessons in the library. The AI is not nearly advanced enough and makes a whole bunch of errors. It even writes words that don’t exist in Icelandic or spellings that don’t exist in Icelandic for any form of the word.

Anyone who is using AI to auto-generate texts for small languages like Icelandic, needs to first check with a native speaker on whether the transcriptions are accurate enough. Trying to learn a language where the text is messed up like this will teach you very, very broken language, misspellings and nonsense. I am sure the AI will be good enough to do this properly sometime in the future, but right now it’s way too far off.

I’d much rather spend my time creating good material for Icelandic or getting permissions from others to use theirs, as opposed to having to set a bunch of lessons like this to “private” in the library.

9 Likes

I thought that wasn´t even possible. I just tried to add audio to a text I already had in the recently released Hindi and it wasn´t allowed. The text wasn´t artificially generated, just the audio by LingQ. It allows the creation for private use but not for sharing, so I thought that was an automated feature in the settings. It didn´t allow me to share only the text, first giving an error message that it should have audio and, after generating the audio, it gave another error message saying I couldn´t share it. So that doesn´t happen in Icelandic?

@Atlan, rokkvi is referring to the TEXT being generated from the audio. i.e. the audio is native, but the transcript is AI generated. Apparently this may be allowed to share, but as he points out, the AI generated transcript may not work well beyond a handful of languages.

In your example you’re talking about the artificially created audio and I know LingQ does not want unauthentic native speaker audio.

Oh, I see, I got my wires crossed when I quickly skimmed it :sweat_smile: thanks
It would be useful to be able to share authentic text though… even without the audio

1 Like

That’s a problem for every language, and something that probably it is not possible to control so easily.
You can say that now, but not everyone is reading the forum, plus it will be forgotten from new users in few weeks, and so on. And the technology will advance creating new possibilities.

Everyone that could see an opportunity for monetisation/reward would jump in sooner or later creating material for several languages without knowing nothing about it.

I would definitely prefer to have some sort of mechanism that could make invisible all material created this way, or at least that could be excluded from the search (both audio and text generation). Or some other sort of native verification or quality process to be involved in this, that could give a green light for this material.
Imho.

2 Likes

My comments aren’t really about AI generated transcriptions. They are more about my ignorance of the LingQ process.

My question is, does a lesson posted in the Library and set for Public use get checked for accuracy by a native speaking Icelander such as yourself? Even texts generated by LingQ?

Trusting that what I’m seeing in the lessons is accurate isn’t an issue now because I’m using the stories and other courses that are obviously generated by an Icelandic person, probably yourself, or teaching organization.

I’m loving the Icelandic course and continue to chug along. Thanks for everything you do.

I am not a LingQ employee but I’ll tell you what I know and what I assume.

As far as I know there is no formal process that LingQ has for verifying content for accuracy. I think the idea at LingQ is that although there may be inaccuracies when people make content they add public, it is still worth it. LingQ would probably become prohibitively expensive if it was supposed to have the same amount and variety of content it has in the libraries, but it would all have to be uploaded by LingQ employees.

Also, I think most LingQ content is not really made by the users, but found by the users on various sites like LibriVox, Project Gutenberg, Project Runeberg, sites of language bloggers etc. - material that is not likely to contain errors - and posted on LingQ. Thus LingQ is partially based on a principle of a participating community.

I think I am the only person right now who is monitoring the Icelandic content to some extent. I have had to put some lessons back to private because of gross inaccuracies and copyright reasons. Almost every lesson in the Icelandic library at this time of speaking has been added by me. Most of them come from language learning material, books, children’s material and so on, that I got permissions for. A fair amount of the material I have uploaded was created by myself and I am probably decent at spelling, but not great. There may be the occasional spelling errors in my “Einföld íslenska”, “Einfalt eintal”, “Flæðiskerið”, “Einfaldir íslenskir textar með Rökkva” etc. but I think it will still serve learners well. If other Icelanders later find spelling mistakes or typos in my texts and let me know, I’d try to correct them as quickly as possible.

4 Likes