Here's a tip to save hours of reading texts

Hi all
I have just discovered Wiki word frequency lists. I dont think I am allowed to post the external link here. But you can easily find.
It takes hundreds of hours to go through texts to tick off words you know or to encounter words you do not know. This is great when you need to see a word in context though.
However if you already know many of the words without needing contexts, you could do what I am now doing.
I download the wiki word frequency list - eg French 5000- 10000. Then i use calibre to remove DRM.
Then I import,
Then I zip through and tap K on every word I know.
Lingq automatically recognises all the words i have met in other exposure, so this is a very time-saving approach when you just want to get to a number of konwn words.
If you need to learn any new word, then of course you can also see the word in other contexts later.
I have ADHD and I get very de-motivated when a task is not quick.
So the next thing I do, is tap Vocabulary at top, then Filters, then select new and the name of the course i imported - French Wiktionary 10000 for example. Then I can quickly see how many words I do not yet know or have not yet affirmed that I know in the top 10000 French words. I can see the list of unknowns, and zip down through the list, ticking them off or moving them to a Number 2, 3, or Known.
Maybe I am the last person to realise this approach is even an option, but if this suggestion helps anyone else to speed this up, then great. I am bored to death with French - I have been at it for years and until I found Lingq i could never really find any interesting resources. Lingq has preserved my interest in getting French to 50,000, but I just want to be done with looking words up in everything I read. I want to be able to get through the very tedious acquisition part and just pick up almost any book or magazine.

6 Likes

This is a cool approach. Thank you for sharing. I am tempted to try it.

But, what about just reading whatever you want to read, keeping “paging moves to known” turned on, looking up only the words you don’t know, and … well … having a good time doing that?

You write, “I just want to be done with looking words up in everything I read”, but there is no reason you need to do that, right?

You write that you want to “just pick up almost any book or magazine”, but what is stopping you from doing exactly that?

It looks like you are trying to get your known words to 50,000 and so if that is helping motivate you and using these frequency lists help you get there, then more power to you!

Mainly I am replying because I used to think that “paging moves to known” was a silly option, until I shifted from studying a language I barely knew to one I knew decently well … and then I immediately saw how this setting let me “just read”. In this way, LingQ can be used basically as an e-reader with the lookup feature there only if you need it. You can even change the blue highlighting style to something less obtrusive if it bothers you. Your known word count will tick up without you having to think about it. I wanted to suggest you try these things if you had not already.

Really though, I support whatever gives you motivation, and thank you again for sharing this!

7 Likes

Yes I know i can do that (read long texts),
But that is very time-consuming, when I am only wanting to learn as much vocab as possilbe as fast as possible.
I find stopping every page or two to pick up a new word makes me forget what I was actually reading.
I want to get the words in, so I can then read and not have to keep stopping every page or two.
For example, I am at about 47,000 words according to Lingq in French. I have now downloaded the 10000 most frequent words and was surprised that although i have 47,000 there are around 500 in the top 10,000 I have never yet encountered (within my Lingq reading, but I do know these words from other reading or speech).
So, I think that knowing the top 10,000 will be immediately very useful to me, whereas spending 6 months reading a novel will be more frustrating and time-consuming, and may still not expose me to everything in the top 10000.
I have read several novels using LIngq, and it has been a superb tool. My post is for those who also, like me, want to mix it up a bit when they are at the intermediate plateau stage as I think I am and can’t face another novel. I don’t remember anything about characters and plots in novels.

2 Likes

Sounds interesting. I think once I mark all known words, sorting my course by “New Words %” will be more accurate.

Which list did you use for French? I got these from search. https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/French

1 Like

I used the same link.
I added 500 words in less than 2 hours. (Obviously I already knew them). But it would have taken many days or weeks to recognise the 500 if reading a novel as the novel might not have contained them.
I’ve now done it for other languages and it’s really helping to accelerate things.
There’s a great feature in the phone app to do flipcards. And there’s the vocabulary testing feature in the web app. Try those too for spaced repetition.

1 Like

I am always curious to understand what people think about Lingq and also their aim to learn languages. I think often people have a rather different concept and perspective, but here is my point of view -

This is not a magic bullet and may not deliver so much. But still everyone should do something like this.

It is very interesting in the top 10,000 words you found 500 or so “new words” despite having 47,000 known words.
This does indicate it was a valuable exercise. Having said that -
Knowing all the “most common words” is not necessarily the best metric for qualifying someone´s ability at a language. This point is not directed so much at you, but for anyone who reads this idea and thinks “great, now I have a massive advantage.” I think there are personally more “foundational” ways to learn a language from the ground up.

My main doubt is not that this isn´t somewhat helpful, but that it might not be that helpful. “Most common words” are not actually a logical way that native speakers learn a language and therefore don´t really go together with being able to use the language properly. Think about this “Our Father, who art in Heaven, Hallowed be thy name…”, “dummy” for pacifier, “doggy woggy” are words that almost all native speakers know in English, but very few second language learners are aware of them and often the native speakers hardly ever use them. But when they do, there is a problem with understanding. It often isn´t the “high frequency” words that are the block to understanding or communication.

These “word lists” are ok, but really offer no ability in the language. What helps and is the most streamlined method is actually having a total spectrum approach to the language. I have spoken Swedish quite well since 2009, but even today I learn lots of words EVERY Swedish person knows but are hidden from so called “high frequency” lists. I no doubt have heard them before but didn´t register them, or had I known them, would have had far better and deeper communication with Swedes. In reality they were often modifying their language to me to simplify it, and even when they tried to explain to me what I was doing wrong, they didn´t know how to explain it or understand what I didn´t understand.

Here are a few issues that are relevant to the known words -
Yes, you definitely should do this or something similar.
However, it doesn´t solve some of the problems that might be tempting to think it does.

It doesn´t give you a solid foundation in the language because it simply allows you to get to the point in a conversation/text/show where you don´t understand anything anymore. It misses a lot of language that might “feel” like common words. For example, known words in Lingq is essentially like how linguists use “tokens”, spellings of words. But that means “walk” the verb and “walk” noun are counted as 1 token when in fact they are two. A fixed phrase like “Going for a walk” is not represented as a token at all, yet it is a “bit” of information just as “walk” verb and “walk” noun is. What this means is, a “high frequency” word count will often miss other meanings of those same words, as well as all the fixed phrases/idimoatic expressions they are in. However, using a tool like ChatGPT and Youglish.com would help reveal the way they are often used, along with native speakers who are also good at English.

What this means is, in a language like French where there are so many conjugations, a verb like “ordonner” has 45 forms, and would be represented in Lingq as 45 words if completed. So when you have your 10,000 high frequency verbs, you would need to times the verbs by 45 to master them. This means if you want to reflect “mastered” French, and we say that 5000 of the high frequency words are verbs, you will need a 225000 word count just in verbs. In Swedish, it is a lot easier as each verb only has five spellings, BUT, they have so many phrasal verbs, collocations, idioms etc, they don´t even get counted as “known words”, yet every Swede is able to use them in communication.

So what is really the best way to get good at a language, using Lingq?

In my opinion, the best, most streamlined method is usually to have a mixture of materials, but along with things you´re interested in, a varied array of graded readers and materials aimed at different ages. With that, if you really want to be good at the language, you should really follow a CLIL concept where you go through the educational materials in that language and get an education in their text books etc. In my opinion, time wise it isn´t actually that hard, as it would only take a few days to get through the first years of school materials when you´re already at a decent level, but it will make your foundation ten times stronger.

I have 76,000 known words or so in Slovak, I really suck at it. There are loads of reasons why, but the primary reason is I have not (until recently) found the right on ramp. I would be about ten times better in Slovak now with a much lower word count if I had just spend the last few years reading young persons literature with a teacher/helper. Now I will do that anyway, and it will be useful to have all the known words, but I feel far from competent in Slovak despite no doubt having most of the “high frequency words”. As stated, “high frequency words” are not actually a reflection of words which are known by most people in a language, especially in the order they learn them. For example - in Slovak if you ask anyone they will tell you “jest” = to eat. However it is far more common to say other things. As Slovaks are terrible at explaining their language (many issues) it took me literally years to know that those other words even existed. Some of them don´t even appear in the most popular online dictionary for Slovak, yet Slovaks know them like tying their shoelaces.

For me, the concept of understanding the languge “like tying your own shoelaces” is really not brushed upon by high frequency words lists. If you don´t believe me, try having a conversation with an old person or a person under 12 in the target language and they often look completely confused or use words you have no idea about. Lingq´s advantage is definitely in the fact you can create the environment that the language is all around you. Ten books aimed at 7-13 year olds read multiple times would give someone far more practical ability in a language than a words list. It is still necessary to know all the high frequency words as kids in that language would also learn them, but they have such a stronger foundation to do so.

Other issues high frequency words don´t touch upon:
the baggage of the words (connotations)
the register of the words (informal, neutral, formal)
the ability to rhyme
the ability to form and understand questions
the ability to use those words correctly

French is of course one of the most interesting ones to exemplify this, where a guy memorised the French dictionary and won a scrabble tournament, whilst being unable to communicate in French.

3 Likes

What is the primary driver for you to learn French? i.e. What do you want to do with it? You mention wanting to be able to pickup any book or magazine and read it, but in this 2nd post you mention not wanting to face another novel, so I’m a little confused.

At 47,000 words I would think you could read just about anything. Not perfectly of course and not without some ambiguity on occasion…but as someone mentioned you could read outside of LingQ now. 3,000 words isn’t going to be a substancial improvement from your current position. Of course, if 50,000 is a goal to hit to “graduate” then I understand your reasoning.

I am currently also going through a word frequency list focusing on words that I don’t know yet. I’m doing a different approach that focuses on that. I’m not worried about words I know already that LingQ isn’t aware that I know. I’m sure there’s quite a few, but I don’t really care if they are ever marked as I don’t have a set “known words” goal.

One thing about the word lists is that you also need to be aware of what the sources for them are. Is a 10,000 word frequency list that focuses on online newspapers or novels more valuable that one that focuses on subtitles? Hard to say. Many of these words you might never use in conversation (or even hear them in conversation)…so to some extent they can be a bit misleading. There’s definitely been some words I’ve come across in frequency lists that are words that are never used anymore (like they may have been pulled from a 1920 novel or something). Anyway, not saying it’s a bad idea of course (as I’m using them as well), but just pointing that out.

I am curious what wordlists you are using? I would say for example, the academic word list is a really great idea to translate into a second language.

One way I would really encourage people to use a word list is to actually use it alongside a thesarus so they can use the common words to find other ways of saying the same thing, and doing paraphrasing exercises with ChatGPT.

I am not sure about French, because the vocabulary is so similar to English, but when I learn Swedish there are often “international words” for example component/komponent, and when you get stuff which is completely Swedish like beståndsdel, grundläggande del, del av helhet etc. I assume it is somewhat like this for French as there are some words that probably didn´t make it across from Latin/French and we can´t recognise from English.

I just went through the Wiktionary top 5000 parole words, and to be honest a lot of it was not really content that would help people learn a language better. These are the new words I had:
frekvensordlista ,parole, getrud, kelderek, frölunda, utvändigt, viveca, dl, emily, huddinge, kulturrådet, shardik, andro, upploppet, förbundskapten, poirot, gwenda, hammou, vägverket, blomqvist, msk, vänersborg,
kommunalrådet, boel, centerns, förbundskaptenen, nygren, farquar, varberg, schyman, chirac, modo, kommunstyrelsen, onsdags, sehnsucht, valutaunionen, arrangörerna, färjestad, kimberly, barnomsorg, halvleken, hulth, leissner, bekla, nordbanken, scaith

A large amount of these are company names, people´s names and place names, which whilst necessary, imagine if that is all you had! It even has the German word sehnsucht (longing) which may well have been written in the Swedish corpus but really doesn´t help you learn Swedish. As a strategy, I would say this is really poor.

The fastest way I have been able to add known words is very simple -
I read a biography/book and when something in it comes up, go to Wikipedia, read about that thing, import it and add the new words. It works fantastically and helps understand the book better. In the long run though, there can be still a lot of comprehension issues even in books with very few new words. I am reading a book by a child psychologist “Kompetenta Barn” and it has very few new words, but the pattern grammar, idioms, jargon etc make it a really tough read.

1 Like

Yes — this is exactly right Hsingh.
Once I tick off all the words I already know, quickly, in a matter of a few hours rather than a few months, I can then hone in much more quickly on the courses and texts that present significant amounts of words I really do NOT know. Up until now, I had been reading texts and having to cover hundreds of pages just to come across words I already knew. I have found that the Filter in the Vocabulary lists does not always pick up all the theoretically unknown words, so I have had to go through many lessons just to encounter more.
So, yes, you are 100% right - the % of unknown words is much more quickly arrived at and more accurate by ensuring one has quickly seen and marked known all the words in for example, the top 10,000 words. Almost any version of this list will do. They all have variations but are basically still the top 9 to 11 thousand most frequent words.

Yes the academic word list is very good.
I am doing this not to learn words in the main, but to convert words I arleady know to known words, so that I can then select ccontent that offers % ages of unknown words that are more accurate, in order to speed up the time it takes to get to 50,000 which is roughly the point where one can read most general texts with relative ease and without having to stop 3 or 4 times a page. I hate stopping as I completely lose all interest in the book. I just want to get the acquiring part over as fast as possible, and to do that I will have to read very extensively and by this method I can read very extensively which will always be necessary, but will be able to find texts with a much more accurate % of genuinely unknown words rather than wading through materials that contain so many words I already know. Hope this explains.

With a more accurate % of unkown words showing in any text, after completing for example the lists of the top 10k, then I can still read novels and articles, but they will be selectable by me to be much closer to my reading level. Up until now I have found it always very difficult to find the level, because so many words that fall in the top 10k had not yet been encountered and so the % of unknowns i was seeing in novels I import was quite a long way off my experience when I read the books. The % shown does not show what % of those are HF words or very rare words – the % shown is therefore not of itself entirely useful measure. It is the % of HF words that is perhaps helpful in terms of becoming more able to access texts or speech.

1 Like

Thanks for the tip. I’m going to try this for German. One question I have always had is how large my German vocabulary is. Since Lingq counts every form of a word as unique, it’s hard to know the true size of one’s vocabulary. These frequency lists should help answer that question.

If the lists you are referring to are actually on a page of wikionary.org or a related wiki, they should have a Creative Commons license, which would allow you to post a link to them here. Indeed, it would be useful you would do so.

I find it rather strange that LingQ doesnt include word frequency in the popup dict. Would be easy to fit in there and super useful for deciding whether or not to actually learn any given word. This is something migaku does very well.

1 Like

Hi
I actually tried to upload them to lingq but i couldn’t make it work. Maybe if you are more technically minded than me, you could have a go and add them to the public resources for everyone to use?
When I was feeling despondent and looking at weeks more reading to get a few more hundred words ticked off, I did the HF list and it just gave me the sense of rapid achievement I was needing.

I think it is a somewhat interesting idea, but in general it is quite easy to work out where percentage words will be using much more reliable methods. You simply look for novels related to reading age/CEFR level. There are still a surprisingly large amount of new words there which are very useful, and all native speakers are familiar with. I have seen how CLIL education works in places like Netherlands and I really encourage anyone looking for a way to streamline their study to look at this method. In every country they arrange their school syllabi due to reading age, which is quite similar to A1 (3-6), A2(6-9), B1 (9-12), B2(12-15), C1 (15+), C2 (further education).

For example, there are very famous syllabi like the IB. You can just look at the syllabus for something like this.

In a school with CLIL, they will be doing something like Sciences, Geography, Maths, Art etc in the chosen language. The exposure time is massive and allows students to learn all the words a native speaker knows in a relatively short time with logical and progressive difficulty.

From all the methods I have seen, this is the most effective and anyone can use it for self-study.

From the wordlists I have seen, there is an awful lot of junk in them like place names, brands, etc. Whilst it sounds interesting to say “Wouldn´t it be great to know some of the most high frequency words used in a language?” it actually isn´t very useful in practice because the lists are NOT actually the most known words in a language and are like putting the mid section of a building in thin air and expecting it to float on something.

In English, one of the most common words used is the, yet a vast amount of people using English don´t even know how to use “the” properly by the time they are doing C2 exams. A far better exercise in my opinion is to go to ChatGPT for example, and say
“Please give me a list of words on X subject that all native speakers probably know, but foreign speakers usually miss.”
I would then also read kid´s encyclopedias like the DK series in the target language, with the thought, “a smart 12 year old in this language 100% knows all these words.”

Fair enough, use wordlists, but be aware, they are really not very helpful imported into Lingq out of context, it is too abstract. There might be some use in using them as part of a fill the gaps exercise or using chatgpt to make sentences using them.

Lingq has produced (I have no idea where to find it) a very nice chart of all their languages and all the numbers for Lingqs that are needed to reach (in very rough terms) the various CEFR levels. Maybe if anyone reading this knows where that is, they could post a link to it for us.
I don’t remember exactly but I think around 50,000 meant that you could understand pretty much most things at C2 level (obviously only in principle, and maybe only in writing rather than rapid speech etc). But anyway, it’s important to me to have some kind of a rough goal to get to otherwise I just wander off, and that’s why I decided on 50. I might do more but want to get Fr to 50 and then Ge to 50 before deciding whether to progress those further still, or whether to get Portuguese next to 50. There are no must’s.

I think that’s the same link as me. But I also found a few others and have done them too. There’s an Academic Word List somewhere you could find helpful.

I can understand your point of view, I used to think like this, but actually it is based on a few faulty premises. Here is what I have realised and a few ways to address it:

1 Most people thinking about wordlists think “Ah, that is going to be the words I need to know” - in fact this is completely wrong.
Why? Firstly - you´re going to be learning lots of words which are used in the media/books, and you´re going to then equate that with being capable in a language, when actually that is a false premise. It does not help your word order, it doesn´t reveal hidden meanings in the words, it does not reveal how people actually speak about these things in practice (maybe how they write about them). For example, in English you can say “rent”, a person goes to a car rental firm and says “I want to rent a car”, they say, “great, so you´re going to hire it out for how long?” and the second language speaker is now having an aneurysm because they don´t understand what this means. They have a meeting, they want to postpone it, delay it, the native says “Ok so we´ll push it back.” They have never found this in their word lists. Great.

I study German, and as a side note, this is one of the most frustrating parts of German in the beginning, was when you translate the word you want from English to German, it almost never gives you a good word for word, you get some esoteric word and in reality Germans say something very different day to day.

What does actually get you the desired result (i.e: understanding what people generally say, and being able to capably use that language) -
Reading in a structured manner by level/age rating
Reading diary form novels and keeping your own diary to describe your day
Reading material that leads you through the life of a person in the language so you know everything most humans who speak that language know

2 Different languages have shadow vocab
German is a classic example of a language with shadow vocabulary. Word count is a very poor way of considering whether you know German or not. That is because a lot of meaning in German is generated through pattern grammar like word order, idioms, phrasal verbs and collocations. Here is an interesting fact - I have the same novels in German as I do in quite a few other languages. German often has less “unique” words in it´s word count than the other languages. It still communicates everything that the other languages do. Why? You have for example steigen. It goes with - ein, aus, um, auf, ab, hoch, herab, hinab, an. In other languages there is often a version where it will be a completely different word. Yet in German it shows up as the steigen conjugations appearing as just one thing, and a bunch of random prepositions.

3 What should you aim for with Lingq to represent a decent vocabulary?
In reality, through reading genuine texts, when you get to about 200,000 known words, you´re representing a pretty decent student of a language. If you took all the words an educated native reader knows you´ll be in the millions, literally, because they know loads of place names. If you restrict it to just verbs and nouns for “objects” and devices, still, millions, as there are millions of things.
A better way of looking at it is with a mixture of words read and words known and ideally, listening exposure and words written compared to words known. A native speaker who has a university level education and reads ten books a year, reads the news daily, probably has read over the equivalent of 50million words read, at least a million words written and only a tiny fraction of those are the words they can understand. With Lingq, you can really use it to record your attempts to close that gap.

Have a look at the guy who is the top of the German chart for words known. He has about 300,000. I wrote “Wow, you must be good at German now!” he wrote back “well, at reading at least.”

1 Like

And even within the separable prefix forms there can be many varied meanings. ausmachen for example has many meanings.

One thing I’ve been doing is taking the word frequency lists and going through finding words that I don’t know well and are useful, and asking chatgpt to give me example sentences that illustrate the various commonly used meanings of the word. I’ve been keeping this mostly in a spreadsheet as I’m using these in a few different ways, but I do also occasionally then throw the sentences into Lingq.

A bit tedious and a bit of work, but helps give context as you suggest above.

3 Likes

“Hatten wir nicht ausgemacht, das du das Licht nicht ausmachen sollst?”
“Wieso, macht es dir etwas aus?” :rofl:

@TOPIC:
In German we have a saying: Langsam ist sicher, sicher ist schnell. (Slowly is safely, safely is fast.) I always advocate trying to boost the efficiency. However, from my experience if one tries to take shortcuts, one usually needs much longer.

On a site note: Some of the 10000 most frequent words in German based on a list by the University of Leipzig are Mustafa, Brenzlauer Berg and Angela Merkel. :upside_down_face:

2 Likes