# Search by % of Unknown words

Anyone else think a search by unknown words would be more useful than a search by unseen words?.

I mean… If a lesson has 1000 words. 10 are blue and 990 are yellow, this shows up in the “new word” search as 1% new. That’s pretty much useless to me.

2 Likes

Before anyone responds with “you just don’t need that. listen and read a lot”… think about this: if you had this search feature / stat for each lesson, you could easily and gradually move to harder and harder lessons. In fact, this is my single biggest complaint about LingQ. If you’re going to follow Krashen, follow Krashen completely with his “i+1” theory.

with a bit of clever statistics… you could create a ranking for each lesson per user.

i - level lessons, exactly at your level
i - 1 - a lesson below your level
i - 2 - a lesson 2 levels below your level
i + 1 - what you should be reading
i + 5 - I call this pretty much the useless level

“with a bit of clever statistics”

What kind of clever statistics do you think could be done?

Very clever. Probably a whole graduate thesis or 2 or 3… Maybe whip out the Natural Language Toolkit in Python and analyze the sentence tree variations to give you a difficulty of the text. Even if you could get this step done correctly… you would still need to figure out the user’s level somehow. I can think of a few ways… but really, that’s not happening.

Find a clever method… and a method that can be generalized across all languages. Pay people to think about this.

So, how about an easier approach and start there:

Unknown word = A word that is not known (shocking) or is “blue” or is “yellow”

1. Forget about the “i-n” classifications. Anything less than “i”, just mark as i-1.
2. Define “i”. This could any text with 5% unknown words.
3. define “i+1” as any text with 5-10% unknown words
4. define i+2 as any text 10-20% unknown…
etc.

Okay, with this scheme you might as well throw out the idea of calling it “i+1”. Hell I don’t even care if you call it i+1, i+2. Just let me search by number of unknown words and then sort.

You might want to check out this thread before you get your hopes up:

Oh yeah, now I remember this thread. But really… adding an unknown word could really should be like a 2 hour task and I think there’s a lot of benefit to it.

@spatterson
Yep, I’ve yearned for this feature for long. Sometimes it is not only about the level issue, but also the time and the patience you have (or, at least, I have :-)). For instance, I have no problem to take a German article with 50 - 60 unknown words and parse it with ease. However, when the number of unknown words gets higher, say 100, I started to loose the patience and felt uncomfortable. However, it is somehow difficult to find the proper articles in this way in LingQ.

But, I would suggest the searching-by-unknown-words feature should go with some tolerance, say plus/minus 5 words, in the searching algorithm.

FYI - there is already a sort by unknown words. Normally I load everything in the library less than 25% unknown, then sort by unknown words. I then choose lessons with less than 10 unknown words per minute. If there are no lessons in this category, I choose the least number of words per minute available.

Yes I do the same. Not optimal but at least it is a solution