Issues with Deepl Automatic Translations in Lingq

srdurden · August 2, 2023, 6:46am

Since the switch to Deepl for automatic translations, I’ve noticed that many translations contain serious errors. Sometimes, the translations create imaginary text (for example, inventing parts of conversations), and other times, they show content that has nothing to do with the original text. Interestingly, when I tested directly with Deepl, the translations seemed much better than the ones in Lingq.

For instance, today, with the Japanese sentence:

わざわざ来てくれなくても、呼んでくれればいつもみたいに家まで遊びに行ったのに」「いいのいいの、今日はお姉ちゃんとゆっくり話したかったから…お邪魔じゃなかった？

Lingq translated it as: “I’m sure you’ll be happy to know that I’m not a big fan of your work, and I’m sure you’ll be happy to know that I’m not a big fan of your work, either.”

The translation makes no sense (I don’t understand where it came from). However, when I translated the same sentence with Deepl (web version), the result was:
“I would have gone over to your house like I usually do if you had called me. It’s okay, I wanted to spend time with my sister today. … I hope I’m not interrupting anything.”

This translation is much closer to the correct meaning.

Google Translate also provided a better translation:
“Even if you don’t come all the way, if you call me, I’ll come home as usual. I went out to play with you.” “It’s okay, I wanted to talk to my sister today… Did you bother me?”

And, of course, the translation with ChatGPT was the closest and most natural:
“Even if you didn’t come all the way, if you had called, we could have gone to your house as usual to hang out.” “It’s okay, it’s okay. I wanted to have a leisurely chat with my older sister today, so… I wasn’t a bother, was I?”

Please, could you review the implementation of Deepl translations in Lingq? There might be some API parameterization causing these issues or inaccuracies in the translations. Also, I’d like to suggest adding an option to customize the translation engine, allowing users to choose between Google (previously used) or Deepl translations. This could offer more flexibility and potentially improve the results based on user preferences.

Lastly, could you consider using GPT-3 for translations? Translations generated by GPT-3 generally tend to be better, and I believe the cost of the API for version 3 has been decreasing, making it a potentially good alternative. Translations with GPT-4 would be a dream

Thank you!

roosterburton · August 2, 2023, 12:02pm

GPT for translations is a worthy discussion and literal translations from GPT 3 are performing significantly better than deepL. Accross a sample of languages and texts anyway.

GPT-3.5 Turbo 16K context: Input cost $0.09, Output cost $0.11, Total cost $0.20
GPT-4 32K context: Input cost $1.72, Output cost $3.44, Total cost $5.16

(This is an estimate to translate a set of mini stories into a particular translation via API)

This also assumes that GPT has a 0 error rate and doesn’t require additional calls to filter out bad results. Bad results being, English responses when you don’t want it (GPT is notorious for this) and hallucinations.

GPT is not really built as a translator, although it performs better than translators (in most cases) but still cannot be trusted for bulk unchecked translations.

zoran · August 2, 2023, 6:52pm

Thanks for reporting, we will investigate this and see what we can do.

srdurden · August 3, 2023, 7:06am

Thank you @zoran ! If you need any lessons where I’ve detected it, let me know.

@roosterburton Regarding translations with GPT-3, API calls can be made by passing a JSON. I’ve conducted some tests and it works quite well, it’s not perfect but the current system is not perfect either

For translating one sentence, you don’t need 16k of context (4k is sufficient), you can use the cheaper model. Without being an expert, I believe it could be less than $0.01. According to OpenAI’s website, 1,500 words ~= 2048 tokens, so for a lesson of 1500 words, we would need to account for 1500 words + 1500 words of translation (aprox.) + some extras, totaling around 5000 tokens. The 4k version has a cost of $0.0015 / 1K tokens, so translating a 1500-word Lingq lesson would cost approximately $0.0015 * 5 = $0.0075. I’m not sure about the price of the Deepl API, but it might be worth experimenting with the change.

roosterburton · August 3, 2023, 7:41am

You are correct here sorry about the oversight. In my program I was taking the lessons one at a time to evaluate each mini story, some lessons naturally outranging the 4k token limit 2k input/output so a 16k version was used for ALL stories. Just some lazy programming.

ar: 21,483 words

GPT-3.5 Turbo 4K context: Input cost $0.04, Output cost $0.06, Total cost $0.10

This would be to translate the entire Arabic mini story range into 1 language using a sentence by sentence 4k approach.

def translate_text(text, input_language, target_language, translation_type, model='gpt-3.5-turbo', temperature=0.5, max_tokens=2000):
    # Forming the translation prompt
    if translation_type == 'Literal':
        prompt = f'Translate the following {input_language} text to {target_language} literally: "{text}"'
    else: # 'derived'
        prompt = f'Translate the following {input_language} text to {target_language} idiomatically: "{text}"'

something like this has worked for me on scale, you need additional error checking for English, such as the Langdetect module and other checks to make sure the result is as wanted.

What prompts have you been using to generate good results??

srdurden · August 4, 2023, 7:55am

@roosterburton maybe you can try with something like:

My prompt:

{
“Text”: “Hello, how are you?”,
“Source Lang”: “English”,
“Target Lang”: “Spanish”
}

Your response:

{
“Translation”: “Hola, ¿cómo estás?”
}

First prompt:

{
“Text”: “English responses when you don’t want it (GPT is notorious for this) and hallucinations.”,
“Source Lang”: “English”,
“Target Lang”: “Japanese”
}

In chat mode it is quite consistent, although sometimes it returns text in addition to JSON. The idea would be to always fetch the JSON result and ignore the rest. Let us know if you make progress with your tests

roosterburton · August 4, 2023, 1:22pm

That is interesting to just send JSON. I’ve also heard you can manipulate the response by setting the username of GPT account to the language you want to translate to.

Translation accuracy seems to be one for debate too. The idiomatic returns being a representation of the text in the translated language and the literal being a word for word extract. The latter of which probably won’t make grammatical sense in the translated to language.

Idiomatic translations would probably work better for someone who is casually getting a translation of some content while a literal translation would benefit someone who wants to learn that language.

kraemder · August 5, 2023, 5:09am

All too often when I click a blue word the translation is just: “The” if someone else hasn’t updated a translation already. It’s quite annoying.

srdurden · August 17, 2023, 7:00am

@zoran have you been able to review the bug?

It’s quite annoying since you can’t rely on the translations. Many times, it doesn’t translate the entire text, and almost always, I have to re-check in external systems to confirm that the translation is correct.

An example from today:

Original Text:
ただ、ラベンダーのにおいが、やわらかく和子のからだをとりまく時、かの女はいつもこう思うのだ。

Translation on LingQ:

But when the smell of lavender softly surrounds Kazuko’s body, the woman always thinks, ‘I’m not sure what I’m supposed to be doing here.’

Translation on DeepL (website):

But when the smell of lavender softly surrounds Kazuko’s body, the woman always thinks.

Google Translate’s website:

However, when the scent of lavender softly envelops Kazuko’s body, that woman always thinks:

ChatGPT’s translation:

When the scent of lavender gently envelops Kazuko’s body, the woman always thinks this.

The translation done in LingQ makes no sense and is far behind the rest, at least in Japanese to English translations. I’m not sure if it could be influenced by the Japanese text being spaced in LingQ to separate words, or if it’s a configuration of how LingQ calls the DeepL API, but the quality right now is very poor and it’s definitely not a good user experience.

ericb100 · August 17, 2023, 2:17pm

Are you saying for the LingQ translation, it’s adding that extra sentence? Is this content you’ve imported or was it already there? It looks to me like it’s sending the next sentence in the story in addition to the correct one to the translater.

Also, if it’s content that already existed (like the mini stories), I’ve seen a lot of the content having the wrong sentence STORED as the translation. Like it’s off by a sentence or more.

srdurden · August 18, 2023, 6:26am

@ericb100 yes , lingq automatic translation is adding that extra sentence. The content is imported content by my own (so no already stored translations), and no, it’s not related with the next sentence, the extra-sentence has nothing to do with the content of the lesson.

In addition to making up sentences, it sometimes only translates parts of a sentence (leaving out perhaps more than half of the text) or the translation doesn’t make any sense. All of this behavior happens to me very frequently since the change in the translation system that was made recently.

For instance, this morning:

A very bad translation:

Original text:

だから、あなたが、あいかわらず小さいのを見ると、何かおかしいの」「さあ、ぼくのヘやへ行こうよ」文ーがうながした。

Lingq translation:

So, when I see that you are still small, there is something strange about you.

ChatGPT translation, more accurate to what the original sentence says::

“So, when I see you still being small as ever, is there something strange?” urged Bunー. “Come on, let’s go to my place.”

A non-sense translation also from today:

Original text:

「ええ」文一について行こうとすると、文一の母がうしろから声をかけた。

Lingq translation:

The first time I saw her, I was in the middle of a conversation with her, and I was very surprised to see her.

Chatgpt translation:

“Sure,” as I tried to follow Bunichi, Bunichi’s mother called out from behind.

@zoran These examples that I have given are from this private lesson: Conéctate - LingQ

Please, if you guys need more examples let me know. Please consider reverting to the previous translation system while this issue is resolved.

jpp025 · August 18, 2023, 10:59pm

Yes, I noticed some of these issues yesterday, but before that, I had zero issues with extra sentences, parts of translations being ignored, or meanings that don’t quite make sense, as you mentioned.

It just happened yesterday for me, so could it be something with this recent update and not the change to the translation system (?)

Is it possible for a change that happened a while ago to just hit now? Or maybe I don’t notice as much because I am a beginner learner? (That would be a fair point.)

I tend to use human translations that I transcribe into the translation section. Would this be why I notice it less?

Sorry to bother you, but I want to be on the lookout for issues like this that can cause me to learn something incorrectly.

zoran · August 19, 2023, 8:59pm

Thanks, we are looking into it.

srdurden · August 21, 2023, 6:50am

Most of the content offered by LingQ has already been translated beforehand (automated translations only occur the first time someone requests a translation of a sentence in a lesson), so it’s very possible that the content for beginners is already fully translated, either by humans or automated system (google translate or deepl). Regarding your concern about learning incorrect things, don’t worry; learning a language is a long term goal, so encountering occasional inaccuracies here and there won’t harm your learning, especially if you’re consuming native content. My complaint about the translation system is because there are significant flaws (translations that have absolutely nothing to do with the content) that require me to turn to external systems for a translation. I’m sure it will be resolved quickly.

@zoran If it helps the team, I’ve noticed that when a sentence to be translated contains Japanese quotation marks (「」), it often fails (not always, but many times), especially if there are two consecutive sentences (e.g., “「First sentence」「Second sentence」” or variations like “First sentence」「Second sentence」”). This doesn’t happen with Deepl when using the web version.

srdurden · August 25, 2023, 6:28am

Hello @zoran , have you been able to make progress with this bug?

I think the issue lies within the Japanese quotation marks (「」). For example, in the lesson I’m currently reading, there is much less dialogue, and the translations were working so well that I thought you had fixed it, until I reached the first dialogue and it translated it incorrectly.

From today.

Original text:

「顔色が、よくないね」「ううん、なんでもないの」そう答えてから、暢子はふと、史郎がさっき、わき腹をけられたことを思い出して、ちょっと心配になった。

Lingq translation (not a good translation):

The first time she saw him, she thought of Shiro’s injury in his side, and she was a little worried.

Google translate web translation (a more decent translation):

“Your complexion doesn’t look good.” “No, it’s nothing.” Nobuko suddenly remembered that Shiro had been kicked in the side earlier, and became a little worried.

If it helps, the translations in this case were “okay” using both Google and ChatGPT, but in the web version of Deepl they were worse. I changed the Japanese quotation marks (「」) to regular quotation marks (“”) and Deepl’s web version translated the text correctly.

Perhaps you could try changing the Japanese quotation marks (「」) to regular quotation marks (“” ) before sending it for translation?

zoran · August 29, 2023, 9:19pm

It’s reported to our team. I’ll check what is the status and get back to you.

srdurden · August 30, 2023, 6:28am

Thank you @zoran !

In the past few days I’ve noticed an improvement in the translations of texts with Japanese quotation marks (「」). The vast majority of them are now translated correctly but sometimes (not always) the ones that still tend to fail are those sentences where the LingQ importer has cut them in a way that begins with a closing quotation mark (“」”, in a japanese dialogue, someone speaks before that line).

For example, today, it translated this:

Text to translate:

」二美子は思わず頭をあげて数に聞き返したが、数は少しほほえんでうなずいただけだった。

LingQ’s translation (black sentence is not in the original text):

he first time I saw him, I thought he was a good guy," she said, raising her head to ask Kazu.

DeepL’s translation (web version):

The number only nodded with a slight smile.

Google Translate’s translation (much better than deepl):

Fumiko involuntarily raised her head and asked Kazu, but Kazu only smiled and nodded

I believe these are errors comes from DeepL and that the translations we had with Google Translate were better.

I think that in a few months we will have much better translators on the market, so perhaps you guys can consider conducting some tests to implement some kind of translation system of your own based on the SeamlessM4T that Meta released a few days ago.

In any case, I confirm that at least the vast majority of errors that I reported in the thread are no longer occurring.

srdurden · August 31, 2023, 6:22am

@zoran , an example I’ve come across that well illustrates the behavior of the bug:

Original sentence (starts with “」”):

」と、流の言葉をさえぎり、コーヒー代をカウンターの上に残して

Lingq translation:

He even said, "I’m sorry, I’m sorry, I’m sorry, I’m sorry, I’m sorry, I’m sorry, I’m sorry, I’m sorry, I’m sorry, I’m sorry, I’m sorry.

Deepl website translation:

He interrupted Nagare and fled, leaving the coffee money on the counter.

hiptothehop · September 1, 2023, 1:53am

I wonder if those I’m sorry repeated ones are user generated?

srdurden · September 1, 2023, 6:16am

@hiptothehop no, my examples are from private content imported by my own (so no already stored translations).