1000 ebooks in Russian, 1000 ebooks in Mandarin

I need a ridiculoulsy large number of ebooks in thoses languages because I’m conducting an experience in language acquisition. The concept is simple : you take frequency word lists, you extract random phrases for each word you want to learn, than you learn the words simply by reading real phrases in context. I wrote a simple program that does that, I already tried it for a year and I think it works fine for any language. Does anyone know resources where I can download packs of txt/epub in those languages? It’s more difficult for me in mandarin I should say. The reason I need a very large number of books is that when you go beyond the first 5000 words in mandarin, the next words are hard to find. Its expectable: you would not expect the verb ‘jog’, ‘explode’, or ‘translate’ to appear in every random novel, but that does not mean that you’re not interested in learning those words. It took me 20 epub in mandarin to get 4997 distinct words, which is not enough. Same problem with russian. So basically, I don’t need 1000 ebooks, but as many as I can.

Thank you in advance for any help !

Can you explain the part about reading the phrases ‘in context’? Did you mean you learn the words in context of the phrases? Or do you isolate the surrounding text of the extracted phrases in addition the phrases themselves?

I meant reading the word in the context of a phrase, rather than reading just the verb.



Her husband was crying loudly

Maybe the expression “in context” is misleading, cause I don’t care for a real context, like a story. I just need a very short phrase, that does the trick.

Why don’t you try dictionaries ?

I use target to known bilingual dictionaries for a similar purpose. I will browse through a Jp-En dictionary until I recognize a Jp headword, then read the example sentences until I find another word that I am 25% to 75% aware of, and then look up that word… etc. Pretty soon I can see that I have read much of the dictionary that way.

In case you do not know there is Linguee.com which parses all the multilingual UN documentation in a similar way. You may try out Duolinguo but I do not know much about it.

You can use the Russian National Corpus for this purpose : Page not found (Russian version contains more functionality)

It’s an idea, but I would need an electronic dictionary so I could automatically extract the phrases. Cause I don’t want to read less important words. Also, I extract phrases that are simple (does not contain rare word at all) I haven’t found one in russian that allows me to do that.

Linguee.com is interesting. I wonder if I could make an app that would poll the site with a querystring
walk - Russian translation – Linguee

and parse the result. I would have to try that, maybe the site will block me If I make too many requests.

I will try it, very promising! I looks exactly like what I’m looking for. I wish I could find something like that in mandarin.

I should mention that my final instrument of study must be paper (or epub, but I prefer paper), so I cannot use a website to browse pages for each word I need to read. I need to extract all I need and print it. I end up with a pages with 20 words per page, one phrase per word. Very fast and pleasant to read. Still websites can be good solutions, because I can extract the data I need with a program.

Here’s a collection of Russian prose (19 - early 20 century): Computer Fund of Russian Language - Машинный фонд русского языка - Русская проза 19-20 вв.. I doubt that it will make 1000 books though.
90-volume collection of Tolstoy: http://tolstoy.ru/creativity/90-volume-collection-of-the-works/ . A lot of data to build your own corpus. All texts are proofread
Also, here’s another corpus that you can download: OpenCorpora: открытый корпус русского языка

Thank you !

Last year I downloaded a few GB pack of mandarin ebooks in txt from a Chinese tracker called m-team (got banned soon after as it was not possible to maintain a good ratio and I wasn’t interested in donating), I still have it, I can upload it to my baidu cloud (they offer 1TB for free btw, pretty awesome compared to google drive or dropbox) and share the link here. For individual ebooks, cn.epubee.com is great (ebupee.org is another one)

That would be great ! I would appreciate it a lot if you could share me this file.

Addendum, this is how I make use of my paper dictionaries left over from the 90’s :slight_smile:

by the way, if anyone has a pack in japanese, I would also take it. I’ve been learning japanese for 3 years, and I have come to the point where I want to actively learn new words strictly by reading phrases. I feel I lack vocabulary.

Not exactly what you want but Anki does a Japanese Example sentences add-on… if I recall correctly it is about 100,000 sentences in EN<> JP taken from Jim Breen’s site… here is the link to the file http://ftp.monash.edu.au/pub/nihongo/examples.utf.gz

Thank you, this will be helpful for sure !

You could also try going to vk.com, the Russian equivalent of Facebook. They have a section that’s called Documents where you can download a vast array of stuff. Try it out!

