Metrics for gauging the difficulty of a lesson

I’ve been a software engineer for decades, and can’t help but muse about possible designs. So this is that kind of musing…

When we look at lessons, we see a percentage of new words compared to our known vocabulary. That’s a really good metric. I love that the site tracks all the words you know and uses it to guess at difficulty.

But there is still a lot of variation in difficulty that isn’t bound to that number. I know because I often experience a completely different learning difficulty–much easier or harder–than what the number would imply.

I wonder what else might indicate difficulty?

For example, the length of sentences. The use of words that are very rare in the language. The use of colloquialisms. Or the thing that kills me in Spanish, I don’t know the name of it… when you use a ton of common words that mean different things in different combinations. (Think how in English words like “get”, “take”, “lay”, “off”, “out”, etc. get combined)

Maybe too, there’s a machine learning approach that would capture all factors like this without having to explicitly design around them. Or even just capturing the statistics of how long learners at different levels spend on a lesson, and use that to predict how the currently logged-in user will find the difficulty.

Inviting others to muse with me. There is no practical thing I’m trying to achieve here.

1 Like


Interesting to find so many software engineers here!

I’ve found the estimated level of difficulty of texts on LingQ to be debatable. Not that I blame LingQ. As you point out, there are more than a few dimensions to the problem.

Of course, it would be even worse if the level of difficulty were also calibrated to that of an individual at some moment in time.

Anecdote: I struggled through the first Harry Potter in French for months. Then I switched to Hemingway’s “The Sun Also Rises.” I expected a literary writer who wrote for adults to be more difficult than a popular writer who wrote for children. But, at least for me, Hemingway was much easier.

Short vs long sentences was a wash between the two writers. The difference, it seemed to me, that Hemingway just had a cleaner, more regular style, while Rowling’s style sprawled over the page with clauses and sub-clauses coming at the reader every which way.

Rowling is a great storyteller, but not exactly a prose stylist…

1 Like

Ha! I remember picking up Prisoner of Azkaban thinking that it would be good for a beginner. I couldn’t make it past the first page.

That was more my fault than Rowlings. But it’s funny we both tried something similar.

1 Like

Average linq/unknown count on a lesson should suffice.

If many users have a very high average lingq count on that particular lesson then you can conclude that most people would have a harder time reading through that lesson.

So here’s a extreme example to illustrate my point

Let’s say across all users one particular lesson has a very high number of lingqs and unknown words. Now compare this to a lesson that has a very low number of lingqs and unknown words. Clearly the former will be much more difficult for most users than the latter.


The way I see it, how that number is calculated is one of the main problems. This is not a complaint towards lingq as I also know that all simple ways to calculate will have their flaws too. The problem is that many people put too much weight on that number, and that’s why having a more accurate metric would be helpful.

The problem with lingqs calculating method comes from repetitions; you can add same words ad infinitum without changing the percentage. That’s why tackling repetitions is in my opinion the way to go. It should be even easy, but I don’t know anything about programming. Maths is at least very basic. Using average repetitions of unknown words divided by average repetitions of known words. This could be used to multiply the percentage lingq uses now. When you have more repetitions of known words relative to repetitions of unknown words the multiplier would be lower and vice versa eliminating “dilution” of repetitions. So having a repetition ratio of 2/5 and lingq percentage of 10% in one text and 2/10 with 20% in another you would get the same difficulty. Someone with programming experience could tell better how easily this is applicable, but the way I see it, it’s a simple calculation that shouldn’t be too hard to implement. The only “problem” is that this metric is really only valid with one’s own lingq and isn’t as easily comparable between languages or users, but that’s not really any different from the current situation. Also there would need to be some max and min ranges for multipliers to remove possible outliers.

As for other complications of language, I don’t see any easy way to make calculations. You would have to analyze more about individual words or language structures instead of making more generalized calculations.


It would seem so, but the way that number is calculated is highly inaccurate. Because the percentage is calculated from unique words, you can have as many repetitions of the same word without changing the percentage. So you could take a text and make sentences with the already used known words without adding more unknown words. The percentage would be the same, but obviously the text would become easier. A similar thing happens naturally when texts are made using different styles of writing; some use more of the basic words which means that there will be fewer unique known words making the percentage higher than a text with more complex expression.


Let me clarify.

Let’s compare two lessons.

Hard lesson: 100 words, 98 lingqs 1 unknown 1 known
Easy lesson: 100 words, 1 lingq 1 unknown 99 known

If you had to guess which of these two lessons were harder for me, it would be the first lesson as it has a far higher ratio of unknown and lingq’d words.

You can extrapolate this across many users to create a kind of loose average. Let’s look at a possible case using the same lessons above:

(on average across all relevant users)
Hard lesson: 100 words, 50 lingqs 49 unknown 1 known
Easy lesson: 100 words, 10 lingqs 1 unknown 89 known

Now we can create some kind of weighted scale. Let’s say that Lingq’d count contributes more heavily towards difficulty than unknown words (as many unknown words are actually known words)

Hard lesson’s difficulty: (50 lingqs0.8 + 49 unknown0.2)/100 words = 0.498
Easy lesson’s difficulty: (10 lingqs0.8 + 1 unknown0.2)/100 words = 0.082

So according to this, the Hard lesson’s “difficulty” is about 6x greater

The main problem with this would be that for newer users this rating will be much less accurate than for older veteran users as pretty much all of the words will be ‘unknown’ for them. to counteract this you can adjust the formula by further weighing for people’s experience. Perhaps using words read count. What I mean by this is people with much higher words read counts should affect the average more than people with less words read. Additionally we should only count people who have read up to a certain amount of words before we include them in the calculations. People who only have 100 words read TOTAL on their account should have no sway on these calculations.

1 Like

Again, the problem is repetitions. Your examples don’t account for them, or don’t compare between how many repetitions there are. Whether you calculate with them or not, results will be skewed one way or the other.

Currently lingq calculates the percentage only from unique words which means that you can dilute results ad infinitum. One simple example of this you can find by looking at books or collections of lessons, that have not been started, at lingq. The total book will have a higher percentage than the average for lessons that it’s comprised of. You could also import a lesson in lingq, first the whole lesson and then the same lesson in two or more parts. The parts would have a lower percentage on average. Do they become easier because you have a lesson/book in chapters? No, it’s a flaw within lingq calculation method. A similar flaw would be also in your method when used in real life.

1 Like

Now that I calculated some examples I noticed my method doesn’t work evenly. It still would make an estimation of difficulty more accurate. With the example that I used lingq percentage would be 5 times higher between same difficulty lessons and mine lowered it to under 2 times. A part of the reason it failed might be that the examples were too extreme. 5 times higher in lingq seems quite unlikely. More likely would be 2-3 times higher which could be reduced to a fraction of that. Maybe someone has an interest and more fresh skill to do the maths of how that method works over the whole range of possibilities. It’s been too long since I have done maths like that and don’t want to waste my time when there are many who could do a better job with that.

1 Like