Analysing my Lingq Statistics: Vocab vs Words Read

I took the data that Lingq provides on vocabulary and words read and made it into this graph. It shows how many words I have encountered and what level I have them labeled (1, 2, 3, Known) for each language I have studied here (I already learned advanced in Spanish before using Lingq). It’s interesting to see some of the drastic differences between languages, some more surprising than others.

I put the languages in order of when I started them which is also the order of my abilities in each of them. Here are my thoughts on each language:

Portuguese :brazil:: Already knowing Spanish was a huge helper. Of these 5 languages, Portuguese is the one that I have spent the most time with outside of Lingq (speaking to tutors, listening to music, and watching YouTube/TikTok). Even though I’ve read more words in 3 other languages and know more words in one, it is certainly the one I am the most comfortable understanding and speaking.

Italian :it:: Italian was harder for me to pick up on than Portuguese, and this makes some sense as it is less lexically similar to Spanish. This is shown in the data by having a much larger unknown-to-known ratio, and by more words read per word known. Overall, I am not satisfied with my level in Italian.

French :fr:: With so much Romance-Language experience, I dove straight into the fifth Harry Potter book (250,000 words). No mini-stories, podcasts, videos lessons, or anything else. Just the book and audiobook. I did this for 30 minutes a day and finished in 5 months.
It was a moderate success. By the end of the book, I could listen at full speed as I read. I had 8500 known words, and 530,000 words read (I did a lot of rereading at the start). All of this in 6 months with little effort put in. The issue is that by the time I had 530,000 words of reading in Portuguese, I had 16,000 known words, and was for all sakes and purposes fluent in the language.
My important takeaway is to have a diversity of material. That being said, this is also a difference stemming from reading more extensively. In contrast, in Portuguese, I probably spent twice as much time reading the same amount of words and studied sentences more intensively.

Dutch :netherlands:: I started Dutch right after I started French. I had 4 weeks of summer left before the start of classes, so I went in headfirst. I assumed that since it’s so similar to English (right??) I would be conversational after about 500,000 words of reading and would know around 15,000 words. I did the mini-stories for a week, then started reading Harry Potter books. I was averaging 18k words of reading per day, which, at the time, meant about 5 hours of reading per day. By the end of the summer, I had reached my goal of 500,000 words read but… I had less than 2,000 known words! I still find this a bit confusingly low. You can see on the graph how low the words known are for each word encountered.
I’ve learned my lesson and have started diversifying my study material, and it has been helping, but it is still the slowest language except for…

Arabic :egypt:: Arabic is going very very slow and taking lots of energy. I looked at Steve’s Arabic stats which he showed in a video, and I just don’t understand how he added so many known words so quickly. I have spent so much time going through the mini-stories and other short stories. But, as you can see, this has resulted in very little. I’ll keep going though, and slowly but surely I will keep learning words.

I hope you found some of this data interesting or can relate to it. I for one find it fascinating and would love to see what other people’s statistics look like!

7 Likes

When did you start Portuguese on LingQ? It would be instructive to see how long it took to accomplish this much.

Here are my known words.


And my words of reading

They line up pretty directly. I remember I was trying to do 10hrs/week for June and July of 22. Nowadays I read around 6000 words per hour while listening at the same time. That’s the same method I was using back then, but apparently going much slower.

1 Like

Compare your Italian stats with mine, someone who learnt Italian as their first Romance language on LingQ. At about the same words read, my stats were:

words read: 623k
Known Words: 4,117
unlearnt lingQs (i.e. 1-3): 15,863
hours listened: 200

As we can see, you already knowing Spanish, resulted in you having 4x as many Known Words than I had! (Even though you had significantly lower hours listened - not sure if your stats of 13 hours are correct?)

Learning languages with lots of cognates is definitely a fast way to get lots of ‘almost-free’ Known Words.

3 Likes

Super interesting! I would estimate my hours listened as around 150. I listen to audiobooks on YouTube or Spotify since it’s too much of a pain to upload them and sync them up with e-books on LingQ. I also listen to podcasts and stuff off platform. I’ve always wanted to know what my Spanish stats would have looked like if I’d learned it on LingQ, especially to compare Spanish and Dutch which are supposedly equal difficulty for English monolinguals.

1 Like

Thanks, Keith: great post, great insights (maybe I just love numbers / stats :slight_smile: )!

That said, I’m curious to know:

  • How did you test your “fluency / conversational level” in your L2s? Usually, (online) tutors / teachers, for ex., aren’t a good benchmark for this…

“I would be conversational after about 500,000 words of reading…”

  • 500k words (read / listened to) are way (!) too low to reach a conversational level in any L2.
    I’ll cross this threshold in Dutch in a few days and I’m nowhere near “fluency” even though listening to “Easy Dutch” vids, for ex., is quite easy nowadays and Dutch is a low hanging fruit L2 for me bc. it feels like a dialect of my L1, which is German…

  • Regarding Dutch: How is it possible to read ca. 500k words in Dutch and have only 2k known words? I know that HP is famous for its simple vocabulary, but still… even wizards should use more than 2k single words, right? :slight_smile:

4 Likes

Unfortunately, only the recognition part (what folks like to call “passive” vocab) is faster, the “use” part is still challenging (esp. because of the interferences, for ex. in “Spanish - European / Br. Port.” or in “Dutch / German”).

So even in a low hanging fruit L2, which is definitely Dutch for me, learners still have to put in the listening / reading and speaking hours to reach “everyday fluency” (from my exp. in various closer L2s: at least 500-1000 h).

The main advantage is probably psychological: Learning L2s with many cognates requires less mental energy, so it’s less draining and feels like a pleasant SLA cruise :slight_smile:

3 Likes

@PeterBormann

For recognition, I wouldn’t really call it “not challenging,” but rather just faster than if you didn’t have such a large amount of cognates. The 17,518 Known Words for Italian by @KeithenC, already knowing Spanish and being familiar with some Portugese, compared to my 4,117 Known Words with the same amount of words read and a similar hours listened (my 200 vs. their 150) illustrates this. That is, 4x the amount of Known Words with the same amount of input.

Unfortunately, we can’t easily compare production / usage with our stats. As you are saying with your experience with Dutch, it’s not super easy (i.e. it requires effort). The question is, Will the speed at which you ‘activate this passive vocabulary’ be the same speed, slower, or faster than another language, which took you longer to learn that recognition vocabulary?

4 Likes

I was actually thinking the same thing, however, I thought he meant that he could start speaking something in that language. Conversational is a very vague concept but good for him if he’s already capable to do so.

@KeithenC very interesting stats btw, I like the chart you made. The problem is increasing the level of knowledge in each language and not just stop there. It will more problematic to have the same ratio as you increase the number of known words.

Here’s my rubbish statistics in German, where I did so many mistakes I can’t count them anymore.

Words Read: 2.882,446 (definitely less because there was a glitch in the middle)
Known Words: 45K
LingQs Created: 63K
Listening Hrs: 127

Now that I’ve changed my goals I use what I have to maintain and reinforce. The good news is that I have a big baggage of Lingqs created that I can use to repeat lessons over and over and convert some of them.
At this pace, in 10 years I will probably know something. :rofl:

2 Likes

Only 500,000 words of reading
After that many words read in Portuguese, I can read novels and watch shows made for native speakers without LingQ, and I can hold up my own in conversations about most things I would talk about day to day, so I’d put myself at a B2 level. Olly Richards had a similar experience when he did his Italian in three months challenge. It would help to know that my Spanish level is C1; I lived in Ecuador for a year, going to school in Spanish, and living with a host family after studying Spanish for 5 years starting when I was 13.
My mistake was assuming that Dutch and English were even close to as similar as Portuguese and Spanish. When I said conversational, I meant that I thought I would be able to have a simple conversation and listen to podcasts for learners without LingQ. Unfortunately, I’ve realized I’m going to have to commit way more time to Dutch than I thought, when I started it as a summer project.

How am I able to read HP with only 2k words?
Well, I actually started when I had 300. In French I started when I had zero. The key is not that the vocabulary is simple (it’s surprisingly wordy). The key is that I’ve read and listened to the series a hundred times, starting when I was 7, so they are pretty much engrained in me. If you start playing the audiobook in Russian or something, I’ll be able to figure out where I am in the series by just hearing the proper nouns. Also, I just click on the words, or go into sentence mod, and voila there’s the English translation. The beauty of LingQ for me is that I can read interesting stuff even though it’s above my level.

3 Likes

My thesis on “low-hanging languages” is:

On the one hand, you get a lot of words for free because “guessing from context” is extremely easy. It feels more like recognizing variations of the dominant language (i.e. the L1 or a strong L2) and less like experiencing the L2 itself.

On the other hand, it’s a non-stop battle against interference and the dominant language that tends to take over production (speaking and writing) at all times.
In a sense, it’s probably more a case of creating an intermediate language that isn’t identical to the low-hanging fruit language itself, but rather to the dominant L1 plus elements of the low-hanging fruit language (pronunciation, collocations, grammatical structures, etc.).

And bringing this intermediate language more and more into line with the low-hanging fruit language in terms of language production is still a lengthy process in which I don’t see any shortcuts (especially not achieving true everyday fluency in a few hours).

In a nutshell, the hours gained by L2 learners with low-hanging fruit L2s in recognition operations seem to be lost again in a constant battle against all kinds of interference. So it’s a kind of SLA zero-sum game :slight_smile:

Note:
“Untrue” everyday fluency would then be for ex.

  • When Germans speak German mixed with Dutch elements (collocations, pronunciation, etc.) - or vice versa.
  • It’s the same with Porthunol for Portuguese native speakers speaking Spanish or vice versa.
2 Likes

This is what I don’t get in your narrative (as someone who has a degree in Romance languages, studied / worked in France, lived in Spain, has worked as a language teacher for many years, etc.):

  1. 500k words read are “not many” words to get familiar with any L2 - not even with low hanging fruit languages.

And just reading / listening to 500k words in an L2 normally doesn’t make you automatically fluent. This stimulus is simply insufficient to facilitate language production. In short, you need to digest many more words (in my language experience and that of other power users, for ex. on LingQ: my guess is ca. 2-3 million words if you want to reach a “solid” B2 level or be on the threshold to an advanced = B2-C1 language level).

  1. 16k words “known” on LingQ means for Romance or Germanic languages that you can divide them by ca. four (because of all the verb conjugations, tenses, singular / plural forms, etc.) so that you “know” only ca. 4k words.

  2. “Knowing” words on LingQ refers to listening / reading operations where you can mainly “recognize” words, but not necessarily “use” (produce) them in collocations / whole sentences / idioms in speaking and writing operations.
    Based on my experience with various L2s, I’d guess that 16k “known” LingQ words correspond to about 4k single “recognized” words (for Romance/Germanic languages and based on native speakers of closely related L1s), which corresponds to about <= 1k individual words that can be “produced” correctly when speaking/writing.

I don’t see that “true” everyday fluency (in the sense of a “solid” B2 level) can be reached with such a small amount of words in recognition / use operations in any L2.

  1. In addition, audiobooks performed by professional narrators with their more or less slow, very clear, etc. pronunciation don’t contribute much to the understanding of native speakers, whose pronunciation is often sloppy/unclear, who are extremely fast, use many contractions, slang and dialectal expressions, etc. in everyday contexts.

Of course, digesting (audio) books one already knows is a good strategy because it facilitates SLA. However, (audio) books à la HP (in Port.) with single narrators don’t prepare us for something like “Sintonia” in Brazilian Portuguese (https://www.youtube.com/watch?v=xsODpM3Rwdg).

  1. Interference: You don’t mention it at all. However, all native ot at least advanced speakers that I know of mention interferences as a constant struggle when trying to learn very close languages such as Port-Spanish, Dutch-German, etc. - and reaching true everyday fluency (and I’m still talking about a solid B2 / B2-C1 level here) isn’t something which can be achieved by only digesting 500k words.

Based on my experience with Portuguese and Dutch (altogether ca. 2000 h) I’d say it’s still necessary to invest about 1000-1500 hours in one of those low hanging fruit languages just to reduce the amount of interference to an acceptable (= non-interfering) level in everyday communication (see my comment to @nfera reg. “inter-languages” above).

  1. And finally, the Portuguese pronunciation system, esp. in the European variant. is, AFAIK. the most complicated and sophisticated of all Romance languages. It’s also something that one can’t master very quickly in everyday communication (even if learners know a lot of other Romance languages, as we both do) - and this is also part of the non-stop struggle with interferences.

In short:
I like your quantitative approach to SLA, but I think your numbers are way too low for achieving everyday fluency (in the sense of a “solid” B2 level).

Based on my years of experience in both learning and teaching various L2s, learners need to digest ca- 5-6x as many words as the 500k you mention to reach a B2/B2-C1 level, using a wide range of materials.

Even then, we still need to spend a few hundred hours speaking in our L2 to hone our oral skills to be able to talk about real everyday fluency in this context.

And can we then read “every” novel in our target language? No, we can’t…

5 Likes

Hi Davide,

I don’t know what your goals are in German.
But why don’t you skip reading at this stage of your SLA journey,
just watch / listen to what you like and use the LLMs to improve
your speaking / writing skills?

What I want to say is: LLMs are fun - and Gemini is coming :slight_smile:

1 Like

@PeterBormann It’s alright, it’s not a problem really, thanks for the tips. I focus on English now as you know from my previous threads, especially writing. All my resources will go there in 2024. I will maintain French and Spanish with LingQ, and I will maintain what I have with German by progressing slowly. Only this will cost me almost 1 hour daily, depending on my accuracy.

I need to manage very well my time and resources, and there is no space/energy for everything. I need to cut a lot of stuff. With German, I focus on reading+listening now, especially repeating short listening lessons that I have already previously read.
As things have changed, there is no project to use the spoken and written language in the near future like I had in 2019/2020. My timeframe to live in Germany was from 2020 to 2022. However, I will keep the language alive by increasing my reading and listening capabilities for the next years, then who knows! :slight_smile:

4 Likes

I never claimed I could read every level of book or understand every word of a fast-spoken whispered conversation between natives without subtitles, and yes my Portuguese has a fair amount of Spanish interference, but I’m not here to prove my language skills to you, so let’s move away from that. My main point was to compare the ‘difficulty’ of these five languages for someone who speaks English and Spanish, and to consider how various study methods could be affecting progress.
I think it’s interesting that your Dutch stats are similar to my Portuguese stats, which makes sense for a German speaker. How do you feel speaking Dutch?
Also, what are your go-to materials to study on LingQ as someone who clearly spends so much time on here? I find it difficult to branch out in content, and just end up sticking to 2 YouTubers and a few book series for the majority of my study time.

2 Likes

So linguistically close languages are faster to gain proficiency in recognition (listening and reading) due to a lot of cognates, but it takes a long time to gain proficiency in production (speaking and writing), due to constant interference of the dominant language/s. It’s faster than a far away language to be able to produce a comprehensible interlanguage, but to truly speak the L2 requires the same amount of time or more (or never if fossilisation occurs).

As you mentioned, this quicker proficiency in recognition is very likely to have psychological benefits, as the learner is more likely to continue to learn the L2, as they get the positive feedback of progress.

2 Likes

Sorry, I didn’t want to be negative, I just wanted to understand what’s going on here.

The main point is that you’re happy with your progress in Portuguese - but, of course,
there’s always room for improvement (for all of us) :slight_smile:

“How do you feel speaking Dutch?”
Weird, to be honest, because there’s a huge gap between my reading / listening comprehension and speaking skills (my writing in Dutch is even worse).

To be more precise:
I can understand simple fiction texts such as Harari’s “Sapiens” or videos on Memrise or YT (esp. the Easy Dutch series) with relative ease now, but speaking in Dutch is quite difficult because the first things that usually pop up in my mind are German expressions. I just tend to give them a “Dutch spin” (exception: simple sentences in Dutch that I already know by heart - thanks to Anki / Memrise).

At the moment, I can have simple conversations with the Memrise Bot, which is based on the GPT-3 model, but I’m still at the parrot stage where I can only answer the usual tourist questions:

  • What’s your name?
  • Where do you come from?
  • What are your interests?
  • Do you like your coffee / tea with or without sugar / milk?
  • Where’s the train / bus station?
    etc.

I could probably have the same results in less time by just memorizing some sections of a Lonely Planet phrasebook for Dutch.

And don’t ask me about my writing in Dutch: it’s just terrible… or to put it differently: the Dutch writing system seems to be completely "confusing " to my German eyes (e.g., many double vocals such as ee, aa, etc. or strange word forms such as “geïnteresseerd” = “interessiert” in German = “interested” in English).

As I wrote to nfera: You get a lot of vocab for free, but it takes also a lot of time to minimize all the thousands of interferences in the oral / written dimensions. It’s similar in Portuguese for me…

The only positive thing with the low hanging fruit L2s is, IMO, that it feels like an SLA pleasure cruise compared to distant L2s with completely different sound and writing systems (e.g. Japanese with its pitch accents, Mandarin, Vietnamese, etc. with their tones and thousands of characters).

For ex., even after ca. 1000 h of Japanese under my belt, it often still feels like a slog.
However, I expected that - so no surprise here :slight_smile:

Have a nice Sunday,
Peter

1 Like

Yes, that’s at least my thesis reg. the “interlanguage” for low-hanging fruit L2s at the moment.
And you’re also right about the positive feedback cycle.

On the other hand, I’m not particularly “proud” of my progress in Portuguese or Dutch because I feel it’s too easy,when it comes to recognition operations.

In contrast, I’m very happy when I now understand a few “Easy Japanese” dialogues on YT that were gibberish a few months ago.

The dopamine kick is therefore higher with the larger L2 hurdles, at least for me :slight_smile:

1 Like

I think my problem is that I’m not particularly interested in
everyday stuff, which I find rather boring in all languages, e.g.: how’s the weather?,
where’s the toilet?, how do you like it here? what do you do in your free time?,
do you want to talk about your work? - of course not :-), etc.

I’m more interested in the humanities (esp. history), the social and computer
sciences (right now: mainly AI). And there’s tons of material out there.

But of course, that doesn’t exactly help improve my everyday communication skills.
So there are some things I just have to endure, like “dating shows” in Br. Portuguese on Netflix (“O crush perfeito”, anyone?) :slight_smile:

So, my “go-to” resources in all L2s in combo with LingQ / ReadLang are:

  • Podcasts / YT vids reg. everyday topics (say ter.a.pia on YT for Br. Port.).
  • Contemporary crime / mystery / thriller shows on Netflix, but I tend to avoid period dramas, fantasy or sci-fi stuff for language learning at an intermediate level even though I like these genres as well.
  • Non-fiction books about all kinds of topics: learning, self-optimization, politics, popular science, tech, whatever.
  • Blogs / Wikipedia articles.
  • News sites.
  • Popular fiction texts with audiobooks (crime / mystery / horror or thriller novels with a lot of everyday vocabulary / slang, but again: no fantasy or Sci-Fi) that were published in the last 50 years (say between 1970-2023).
  • Talk radio

After that (i.e., from a B2-C1 level upwards), I usually want to read / listen to some of the “classics” (pre-1970) and all the genres I’ve neglected so far (fantasy, sci-fi, historical drama, and poetry).

Last, but not least I use online learning platforms in my L2s. But I haven’t done this in Portuguese, Japanese or Dutch yet because my language level in these L2s is simply too low at the moment.

In the future, I want to use generative AIs more and more for creating personalized content in combo with (Audio)Reader SW à la LingQ, ReadLang, etc. as well…

3 Likes