Dec/24 LingQ Changelog: AI lesson pre-processing

mark · December 6, 2024, 12:37am

As we head into the holiday season, a quick update on what we’ve been up to on the development team since our last update in the summer.
For the most part, we have been occupied in optimizing and improving performance of our AI features and trying to make the feature set up to date across all apps. Along with the usual assortment of bug fixes.
We are still in the midst of some relatively major backend improvements that, while not providing immediately obvious benefits to you, will do so over time. Hopefully, our pace of front end improvements will pick up once we are done with our current phase of backend upgrades sometime in Q1 2025.
The major advance in this Changeset is the lesson pre-processing that now happens for all imports. Full text AI translations, word translations and splitting of Japanese should be happening on import and should happen relatively quickly. No longer should there be lessons in the Processing tab. Doing this up front means not only importing the lesson, but the speed of sentence and word translations should be noticeably faster and should be available offline as well.
Keep in mind that this processing has not been applied to Library lessons yet. But, over time, we should be doing that as well.

Let us know how we’re doing of course and hopefully you’re taking advantage of the shorter days, (those of you in the Northern Hemisphere!), to spend more time LingQing and pumping up your stats before your LingQ Annual Review arrives!

All Apps

Lesson preprocessing for sentence translations, context word translations, word splits in Asian languages
Optimized performance and workflow for lesson pre-processing
Updated challenges list
New book challenge
automatic (AI) reformatting of unformatted imports
YouTube import bug fixes

Web

in-app page titles reflect page content
Fixed text to speech audio playback speed selector in page view to match sentence view
Show merged hints in vocabulary lists
Improvements to highlight colours in dark mode and other alternate themes
Fixes to m4a audio import
Bugfixes for Whisper audio transcription
Fix to Korean TTS

iOS

Speaking activity in sentence mode review registers for Speaking statistic
1.75x audio playback speed added
Reading speed added to Stats
Keyboard shortcuts enabled
Show merged hints in vocabulary lists
Fix sentence mode vocabulary setting
Fix double TTS when tapping quickly

Android

Redesigned Stats page
Reading speed added to Stats
Show merged hints in vocabulary lists
1.75x audio playback speed added
Simplify Lesson feature added
Setting to stop lesson audio when tapping on word
Enable import of mp3 files for whisper transcription

scrubtaku · December 6, 2024, 2:56am

At least with Japanese there is a huge issue regarding word splitting that is inflating the word count. This has been going on for a few weeks now and the fix was to click regenerate with AI but as of right now that no longer works. It keeps failing. This really interferes with the whole process of lingQ. I just wish something could finally be done to fix it because i really enjoy lingq but lately not so much.

mark · December 6, 2024, 5:05am

The issue is that imported lessons have an inflated word count? Can you provide any more details? Or a link to a lesson that is doing this?

scrubtaku · December 6, 2024, 6:05pm

Its doing it with all imported YT lessons lessons and ai generated audio text. That is what i use the most.

People have been in the forums talking about this for a month now. Idk if its with all languages but for sure at least Japanese its been pretty bad. Lessons that would normally only have 4% or 5% of unknown words now have around 20%-30% because the splitting tacts on random bits of grammar from the previous/following words, registering it as a new word. For some strange reason there will sometimes be even words that i have already lingQ’d or learned in the past will again be blue as if it was a new word.

This was never this bad before these recent updates. I have been using lingq since 2019 for reference. Here are some screenshots to show what i mean.

Before Re splitting with ai

After re splitting with ai

Here is a link to another thread talking about it since 20 days ago.

Re splitting fixes the issue but its very bothersome to have to do an extra step to fix the lessons that worked just fine before these new recent updates and last night the Re splitting with Ai was failing on me not to mention it takes time for that process when before you could just jump straight into a lesson.

Atlan · December 9, 2024, 10:34am

I noticed that with Hindi as well.

SeoulMate · December 10, 2024, 11:28am

With Korean, I am certain that some words i have marked as known in the past become ignored as they have the trash icon clicked. Thankfully it doesn’t affect the overall word count, but I thought i would mention it.

mark · December 12, 2024, 1:38am

We have pushed some fixes to importing Japanese. Can you all try importing again and let us know if things are improved. They should be. Let us know any issues and also let us know how the splitting could be improved. Using AI we should be able to make it work better than it ever has.

len1984 · December 13, 2024, 1:36am

Thanks for your hard work! Thank you for your dedication to LingQ.

ohammer · January 6, 2025, 7:15am

Great to hear that AI issues are being tackled. For Chinese, the AI model has some quite basic problems: Some characters that have several meanings are parsed incorrectly, and the AI model can choose the less common reading even in very basic words, 行 in one of the most frequent words in Chinese, 行了，xíngle, ”OK”, gets parsed as the word for “bank”, háng; the verb suffix 着 , zhe, is parsed as the word zhāo, “touch, come in contact with”, the very common word for soup, 汤，tāng, gets parsed as shāng, an unusual reading of the character that basically only occurs in formal literary style as “torrential”.

mark · January 6, 2025, 4:13pm

Thanks, I’ve forwarded that to the team.

fabbol · January 11, 2025, 2:34pm

Yes, the most frequent problems I encounter in Chinese parsing are when these common single-character words are mistakenly combined with something else into a multi-character word, and when non-Chinese names are split up weirdly.

(E.g. when reading Harry Potter, the AI often parses “Ron said” or “towards Ron”, etc. (or any other name) as a single word. Or, sticking with Harry Potter because I’m reading it: “Gryffindor” is transliterated as 格兰芬多, where the last character “duo” also means “many/much”; the AI often parses e.g. “Gryffindor team” as “Gryffin many teams”. It’s easy to ignore and/or correct when you’ve understood what’s happening, but still an annoyance.)

I wonder if there is a way to let the AI algorithm “know” that if a “word” translates to more than one word (“X said”, “many Y”, “towards Z”, etc.) there may be a mistake to be rectified…

mark · January 19, 2025, 7:36pm

We are working on this. AI should be able to do it pretty well over time. Thanks for your patience.