Import ebook files

t_harangi · May 6, 2018, 10:02pm

There is clearly a bug in the import system, and based on my experiments, I have to say that book length COULD INDEED be a factor. Of the files that I tested, there seem to have been a cut off somewhere between 150,000 to 200,000 words, where the import function failed. Books under 150K imported properly. A book that was 400K words, failed the import AND failed the import when split it half to chunks of about 200K, BUT I did succeed at importing this book when split into thirds, each chunk coming in at under the 150K word count.

Another potential factor may also be the unique word count of these books, which has an indirect correlation to length. The longer a book is the more unique words it may contain, so if unique words are the culprit, longer books would be more effected. (The 400,000 word book I mentioned contains over 26,000 unique words.)

Again, these are just my tests as an end user, without being familiar with the underlying code, so my conclusions may be incorrect. (To further test my hypotheses, one could create a 400,000 word document that repeats the same sentence, therefore eliminating the unique word factor, at the same time creating a different document that clocks at just 100,000 total words, but contains perhaps 30,000 unique words – admittedly this would be a tougher one to create. But attempting to upload these two extremes and see if either one of them fails could pinpoint some problems.)

However the problem does exist! Some plain text files simply do not import properly, and length could be a factor.

mark · May 7, 2018, 5:49am

Of course it’s possible that length or word count or anything could be a factor. Unfortunately, it is not that easy to test these things since it’s normally a combination of factors that cause the issue. The best way for us to fix any issues is for you to send us the files that are not working, describing the issue you are having, so that we can reproduce the problem and that way figure out what the issue is. We appreciate the help on this. I am sure there are many types of files that will cause problems. The only way for us to make the tool more robust is for you to report them so we can identify and fix the issues.

arvydassidorenko · June 16, 2018, 2:31pm

If you expect your users to do all the testing for you, I can tell you that Chinese import never splits text in 2,000 word chunks and therefore reading anything longer than a web article is impossible.

nobody · June 16, 2018, 7:27pm

I’ve scanned some text to PDF and tried to import, but got an error saying I need to remove digital protection. The document is not password protected, and not digitally signed, as far as I know. What am I missing?
Follow up - I opened the PDF in Word and saved it as .DOCX, which imported with no problem.

Net · June 17, 2018, 9:37am

I had the same issue with epub Potter collection I bought, it is not DRM protected but it gave an error, I split it with Callibre into individual books and I was able to upload the first one (did not try the rest).

Though it uploaded meta data (book description) as well and list of chapters and stuff like that so I might as well split it into individual chapters and upload it chapter by chapter if I figure out how to combine them into lesson. Also for some reason when I went through the first few pages it marked new words as known, which was pretty annoying but I am still new to LingQ, will have to figure out how to split the audio and book the proper way so I can enjoy it in LingQ without formating being all weird and stuff like that.

mark · June 18, 2018, 9:17pm

That does seem strange. Can you send that file to us at support at lingq dot com so we can try it ourselves and figure out the issue?

mark · June 18, 2018, 9:19pm

@Net - I would think trying to upload a collection of books would cause problems. Our importer is not set up to split files into separate books. The formatting of the original should be preserved. Is this not happening for you? As for moving words to known, that happens when you page. Any remaining blue words are assumed to be known. You should create LingQs for these words before paging. Or, if you do page, you can simply click on the words you don’t understand to make them blue again and create LingQs for them.

mark · June 18, 2018, 10:08pm

@arvydassidorenko - I tried importing a Chinese ebook and see there is a problem there. I have forwarded this issue to our tech team. Thanks for letting us know.

ApNam · June 19, 2018, 1:19am

I have stories with text and audio in Thai, is it possible to import them? As Thai is not a selectable language.

mark · June 19, 2018, 3:11pm

You can try importing in another language slot but you won’t have proper dictionaries set up so it won’t be that convenient.

LILingquist · June 19, 2018, 4:14pm

I might do this for Irish. There is a lot of content out there, but I don’t foresee anyone putting it together anytime soon.

mark · June 25, 2018, 11:26pm

We have fixed the Chinese import function. It should be working properly now. Sorry about that.

rambles · July 2, 2018, 8:12am

Yes, I like this! Thank you.

Lblack · May 4, 2019, 1:09am

Hi Mark,

Do you have any advice regarding paragraph spacing? Basicaly all the ebooks I import (generally .epub) lose the majority of paragraph spacing. It can make reading dialogue tricky.

Thanks for all your help!

mark · May 4, 2019, 3:19pm

@Lblack - Definitely something we have to work on. Right now we do tend to clean out most formatting because it can interfere with our own text manipulations. Preserving more formatting is something we are focused on as we continue to improve the reader.

In the meantime, you may be able to imp the formatting manually by editing the lessons and adding line breaks.

Lblack · May 4, 2019, 6:14pm

That’s great - thanks for the response! Looking forward to improvements to the reader as they come.