When you’re reading something, roughly what percentage have you found to be the most helpful for acquiring the language? Not so difficult that you’re seeing so many new words that it’s hard to remember them all, but also not so easy that you barely see any new words. For me I think the sweet spot is somewhere between 10%-20% unknown words. But I’d love to hear what you’ve all found work for you!
I have used about the same range. Just keep in mind that how that percentage is calculated is highly variable debending on lenght of the text and repetitions. Not forgeting writing style. Starting from lowest % is a good strategy, but at times you will hit something that is a bit too hard and you have to struggle through. On the other hand, some get maybe a little too easy, atleast in the sense that they might have been more appropriate before some of which you struggled through before. Podcasts or other content that is more spoken language tend to be easier compared to stories. Probably because the sentences where you have those unknown words are easier. Possibly also, in languages like Spanish, you will have more conjucations of the known words in podcasts, and stories will have more actually new words. Better have little different styles depending on how difficult content feels.
I seem to remember reading that the common wisdom is that 10% is best, but I agree with you that it can be pushed a bit. But I think there’s a limit, and I guess it would have to be different for different people, depending on their memory abilities.
I find that if there are more than 20% new words, reading becomes a chore, because it’s just too hard to figure them out by the context, and I spend more time looking up words than I spend reading.
If I’m interested, I don’t mind up to 40% New Words. If I’m not particularly interested, 10-20% is a good bracket.
According to LingQ I have reached an Advanced level of vocabulary, which I don’t mention to brag, but to say I’ve been at this a while, in terms other LingQers can understand.
At this point I don’t worry about New Words %. There’s not much over 20% I want to read. I can now pick up a lot of words from context. My main challenges are secondary/tertiary definitions plus French grammar and expressions foreign to English.
I don’t mind reading less 10%, if the content is interesting. Since I’m reading faster these days, I can read enough to pick up as much vocabulary per day as before.
As I’ve mentioned elsewhere, I jumped up to the first Harry Potter in French, when it was 40% or more New Words. Perhaps that was not the most efficient approach – I don’t know that I would recommend it for everyone – but I got through it and I’m still going.
I’m pretty sure I read somewhere that the ideal from a Krashen viewpoint, is about 5% unknown. However, I think you can get a lot of benefit with much more unknown content, 20% for sure. But there’s not just unknown words, there’s also combinations of words with a specific meaning, such as les services de renseignement, or la benne à ordures where you might know the individual words, but not the composite meaning. And of course you might think you know a word, but it’s being used in a novel way.
I don’t think there is an easy answer to this one. I have a range of exercises. Each day I spend an hour listening to some content, and stop to check unknown words. If I stop too often, it spoils the flow, and I don’t gain much. So 5% unknown might be okay. I also spend an hour listening to content outside LingQ without checking unknown words, I just ignore them, or guess. So in that case unknown words are not an issue, unless they make it hard to understand the content.
And there is one case where a large percentage of unknown words is fine, and that’s when reading a transcript while listening to the audio, in order to train your brain to recognise the words i.e. pattern training. Thus in this exercise the meaning is secondary.
Don’t forget that even when hearing known words, you might be hearing them in new contexts, or in a metaphorical sense. Thus a plane might literally fly, and buns might metaphorically fly off the shelves.
So maybe it’s best to have a decent amount of input with ~5% unknown words, but you can include content with up to 20% unknown words if you have no choice.
Interesting. I appreciate your comments. You are realy banging the listening comprehension, which I find the most mysterious aspect of lanaguage learning. One can’t decide to listen better next time. It comes as it will and all you can do is keep listening.
I listen/repeat/shadow for about an hour/day on a Sentence View basis. I try to listen until I can hear most of the words in the sentence. I find hearing the words comes slowly even if I know the words.
Yes, Krashen is definitely a C+1 guy. But he was working on Comprehensible Input back in the early 80s. Has there been any research on C+n, particularly with tools like LingQ, which make it far more doable to “punch above one’s weight” in language learning?
I definitely couldn’t have gotten through Harry Potter with only a few months of French under my belt without LingQ. Whether that was an optimal choice at that point is another matter. However, over the course of a year, my C+n approach has worked well enough for me.
Now that I’m reading at an intermediate level, I’m taking another page from Krashen’s notebook. He noted a study which concluded that students who continued with the language they studied in school were those who reached the level of pleasure reading in their target language. So I’ve got a lineup of my favorite novels translated into French cued up.
Krashen probably uses different method of counting, total unknown words/total words whereas lingq uses total unique unknown words/total unique words. Those give completely different results in general. It seems that usually lingq method gives about 3 times higher percentage. Both methods have their pros and cons. Krashens becomes more accurate the longer the sample size is. Which would be logical for scientific method, but not very useful for short texts. Lingq on the other hand is more accurate when comparing shortish text about the same lenght, but would be more discouraging in long texts than they should be. Whichever way you count the results will be variable.
What is the difference between an unknown word, and a unique unknown word?
Nothing if they are single words, but once there are repetitions unique word count doesn’t increase as it’s only counting unique words. That’s why you get higher percentage as more common words repeat more often, but are only calculated once.
Beware that 50% of unknown words for 50 minutes video can be easier 30% of unknown words for a 10 minutes video.
Let’s take text:
I eat an apple.
Let say “I” “eat” “an” are known. “Apple” is unknown.
25 % of unknown words.
So you have a new text with words apple and orange as unknown words.
text is
I eat an apple. I eat an orange.
5 words. 2 words unknown So 40% of unknown words.
Strictly speaking the two texts have the same difficulty : one unknown every 4 words.
You can see this when you import a book.
I have a book with 78% unknown words.
This book has been divided by Lingq into 99 parts.
Each part has something as 40-50% of unknown words.
I really would like Lingq to offer a measure of unknown words that allows to compare between different text size.
Something as average number of unique unknown words every 1000 words.
I still have no idea what the difference is between an unknown word, and a unique unknown word.
Let’s take text:
—bla bli blo bla.—
bli and blo are known.
You have 2 unknown words bla(first occurence) and bla(second occurence).
You have one unique unknown word : bla.
In German language schools, teachers teach “laddering up” approch i.e do not touch Novels unless you study all the grammar rules there in a language (Beyond B2 there are no grammar rules taught). Hence, they say pick materials based on your current level and increase your vocabulary and grammar knowledge gradually. If you are at B1 level, pick up Perfekt Deutsch magazine (that offers glossary also graded readers). In other words, do not pick up novels/books aimed at adults too early.
On LingQ 4 there was an option “full translation” where you could copy and paste translation of the chapter so that you could do parallel reading. This translation took away the difficulty of the text without disturbing the flow of the reading. I read a few books in German this way right from the go. Unfortunately this option is no longer there.
Even with this approach I had the urge to read easy materials and graded readers to pick up on grammar points by osmosis. If sentences are short, our subconscious mind absorbs them like a sponge.
In conclusion, both approaches will work in the long run. In other words, you have to read both easy and difficult reading materials.
With tools like Lingq I do not have to wait for tackling difficult books right from the set go that is a huge advantage. However, at some point, I need to read easy materials as well to guage how far my understanding in the new language has come along. Whatever your decision is, you have no option but to combine both easy and difficult reading materials.
In a few videos Steve Kaufmann pointed me at an old-school polyglot, Kató Lomb. who based her approach on interest:
…her favourite method was to obtain an original novel in a language completely unknown to her, whose topic she personally found interesting (a detective story, a love story, or even a technical description would do), and that was how she deciphered, unravelled the basics of the language: the essence of the grammar and the most important words. She didn’t let herself be set back by rare or complicated expressions: she skipped them, saying: what is important will sooner or later emerge again and will explain itself if necessary. (“It’s much more of a problem if the book becomes flavourless in our hands due to the many interruptions than not learning if the inspector watches the murderer from behind a blackthorn or a hawthorn.”) **
So we don’t really need to look up each and every word in the dictionary: it only spoils our mood from the joy of reading and discovering the texts. In any case, what we can remember is what we have figured out ourselves.
Works for me.
I think for every language learner the problem is finding input that is comprehensible at your level. That is much easier if you’re already C1: you can simply read or listen and pause for (or skip) the occasional unknown word. But the lower your level, the harder it is. If the target language isn’t similar to your own, it might take a lot of work just to reach the level where you can identify the set of sounds that are spoken.
The other problem is that most A-1 or A-2 level content is very boring. That is what “comprehensible input” seeks to avoid, right? I use English sub-titles to make things comprehensible before they actually are.
@gaoli
Yes, the “non-compelling” aspect of A1/2 content is a problem.
However, you can use genAIs for simplifying more advanced content.
See this concurrent thread: How do you get good at writing German Language?
I’d say the future is highly personalized content using AIs.
I think you’re absolutely right. Right now, with the right prompts, an AI can probably write stories using only the most common 2.000 words, or 5,000 words, and some AI systems can already talk to us in whatever language we desire. And AI is only in its infancy. In ten years, who knows how far this technology might advance.
It changes how the percentage is calculated. For example if you have a 30 word text that has 9 unique known words each 3 times with total of 27 and 3 unique unknown words one each, with Krashen you would calculate 3/30=0.1=10% and lingq 3/12=0.25=25%. On the other hand, if those 3 unknown words were just one unique unknown word repeated 3 times, then Krashens method would still get the same result and lingq would be 1/12=0.083=8,3%. These are just simplified examples. In reality results might be skewed in many ways between methods and also inside methods. With lingq you can basically repeat words without it affecting percentage. You could have a text that is 1000 words long and text that is million words long with both having total of 100 unknown words and both have same percentage with lingq.
Thank you. I know understand. It seems a bit odd, for me that sentence ony has one unknown word, bla, but yes I can see that one could count that word once, or multiple times.
I already said nothing when talking about single words. It is only one way of categorizing. It means that words are only calculated once no matter how many times they are repeated which affects how percantege is calculated.