There is clearly a bug in the import system, and based on my experiments, I have to say that book length COULD INDEED be a factor. Of the files that I tested, there seem to have been a cut off somewhere between 150,000 to 200,000 words, where the import function failed. Books under 150K imported properly. A book that was 400K words, failed the import AND failed the import when split it half to chunks of about 200K, BUT I did succeed at importing this book when split into thirds, each chunk coming in at under the 150K word count.
Another potential factor may also be the unique word count of these books, which has an indirect correlation to length. The longer a book is the more unique words it may contain, so if unique words are the culprit, longer books would be more effected. (The 400,000 word book I mentioned contains over 26,000 unique words.)
Again, these are just my tests as an end user, without being familiar with the underlying code, so my conclusions may be incorrect. (To further test my hypotheses, one could create a 400,000 word document that repeats the same sentence, therefore eliminating the unique word factor, at the same time creating a different document that clocks at just 100,000 total words, but contains perhaps 30,000 unique words – admittedly this would be a tougher one to create. But attempting to upload these two extremes and see if either one of them fails could pinpoint some problems.)
However the problem does exist! Some plain text files simply do not import properly, and length could be a factor.
Of course it’s possible that length or word count or anything could be a factor. Unfortunately, it is not that easy to test these things since it’s normally a combination of factors that cause the issue. The best way for us to fix any issues is for you to send us the files that are not working, describing the issue you are having, so that we can reproduce the problem and that way figure out what the issue is. We appreciate the help on this. I am sure there are many types of files that will cause problems. The only way for us to make the tool more robust is for you to report them so we can identify and fix the issues.
If you expect your users to do all the testing for you, I can tell you that Chinese import never splits text in 2,000 word chunks and therefore reading anything longer than a web article is impossible.
I’ve scanned some text to PDF and tried to import, but got an error saying I need to remove digital protection. The document is not password protected, and not digitally signed, as far as I know. What am I missing?
Follow up - I opened the PDF in Word and saved it as .DOCX, which imported with no problem.
I had the same issue with epub Potter collection I bought, it is not DRM protected but it gave an error, I split it with Callibre into individual books and I was able to upload the first one (did not try the rest).
Though it uploaded meta data (book description) as well and list of chapters and stuff like that so I might as well split it into individual chapters and upload it chapter by chapter if I figure out how to combine them into lesson. Also for some reason when I went through the first few pages it marked new words as known, which was pretty annoying but I am still new to LingQ, will have to figure out how to split the audio and book the proper way so I can enjoy it in LingQ without formating being all weird and stuff like that.
@Net - I would think trying to upload a collection of books would cause problems. Our importer is not set up to split files into separate books. The formatting of the original should be preserved. Is this not happening for you? As for moving words to known, that happens when you page. Any remaining blue words are assumed to be known. You should create LingQs for these words before paging. Or, if you do page, you can simply click on the words you don’t understand to make them blue again and create LingQs for them.
@arvydassidorenko - I tried importing a Chinese ebook and see there is a problem there. I have forwarded this issue to our tech team. Thanks for letting us know.
Do you have any advice regarding paragraph spacing? Basicaly all the ebooks I import (generally .epub) lose the majority of paragraph spacing. It can make reading dialogue tricky.
@Lblack - Definitely something we have to work on. Right now we do tend to clean out most formatting because it can interfere with our own text manipulations. Preserving more formatting is something we are focused on as we continue to improve the reader.
In the meantime, you may be able to imp the formatting manually by editing the lessons and adding line breaks.