This is a fairly technical question. I hope it will reach the right people. Perhaps a forward to the dev team would not be too much to ask for?
One of the best features of LingQ is the ability to import your own content. However, if you’re like me, and want to read whole books using the site, you’ll run into a problem. Importing an entire book as a single lesson is just not practical. Even a single chapter will overwhelm the browser when you try to open the lesson. Not to mention such amount of text is incredibly unwieldy. The only thing left to do is to split the book into many shorter lessons. But importing 100+ lessons manually through the website’s interface is mind-numbingly time consuming. So time consuming, in fact, that it’s not really worth it.
Being a software developer, I did not let this deter me. I created a set of PowerShell scripts that chopped up my book into chapters, chopped those chapters into smaller parts still, (named “Chapter 1 - part 1 of 10.txt” and so on), and then imported the file contents to LingQ - properly named, in correct order, in the proper course. I imported my first book using this script. Thus I was able to read a whole book in Japanese - a huge breakthrough for me.
But when I went to import the second book, I discovered my script no longer works.
(now comes the technical stuff)
I made the script by intercepting the import request in Fiddler, and then mimicking it in PowerShell. It was a simple enough POST request to Login - LingQ. I had to take the authentication cookies from Fiddler and stuff them into the request. It wasn’t pretty, but it worked. Then something changed. Web protocols and authentication methods are really not my strong point, but if I had to guess I’d say you switched from http to https for the entire site. While this is a smart move, I has left me unable to import anything any way other than via the browser. I’ve been trying to reverse-engineer the process again, so far I’ve been unsuccessful. I guess I’ll figure it out eventually, though it may take days of work and cause some premature graying of my scalp.
But do I really have to go through this? Wouldn’t it be nice to have some sort of API, that would facilitate automated imports? Perhaps a simple web service? All it would need is a course ID, lesson title and lesson contents. Just three parameters. That’s it. Should be really easy to implement too. It would just take the values and send them to the same handler that handles import requests from the browser.
Would that be possible? I’m sure it would help a lot of your customers. I’d even share my import scripts, provided I can make them work again.
I am intrigued by what you have done but I am not at all technical. Leave it with us to see if there is anything that can be done. Thanks.
On a different subject, e-books that I want to buy in different languages are usually in PDF or some format that I can’t import into LingQ. How do you deal with that? I realize that a purchased e-book cannot be shared in our library but I only want to import it for my own use, especially where an audio book is also available for purchase.
There is a linux utility called pdftotext, used like this:
pdftotext -layout input_doc.pdf output.txt
I’m sure your tech support guys could script this and process a mountain of pdf docs for you in a jiffy.
Thank you. I will follow up on this, although I am not allowed to distract from whatever they are working on just to please me.
Hi Steve, thanks for not hijacking my thread with your question
Unless the book you’re using has DRM, or was made by someone scanning the pages of a book and inserting them in the PDF as pictures, it should be fairly easy to convert between various formats. There are many tools for this out there, and some of them are completely free. I recommend Calibre, an “e-book library manager”. It’s a free e-book reader that lets you convert between formats with many advanced features. It’s what I used to convert my e-book from Epub to txt, while simultaneously removing embedded furigana and formatting chapter titles. It’s easy to use, I was able to figure out how to use it the first time I tried it.
Thanks Steve. I hope your team will consider assigning this high priority. I can’t be the only one trying to get large texts into LingQ and failing. I agree with your assessment that a lot of reading is essential. But a lot of reading means a lot of text.
Just to elaborate on possible solutions:
LingQ already has a developer API: https://www.lingq.com/apidocs/api.html
Unfortunately, the import function seems to be missing (although apparently it allows me to retrieve the privacy status of my imported lessons) Adding a rudimentary import function to this API should be easy. I realize full import functionality would be complicated, what with audio file uploads, but all I really need is lesson title, lesson contents, course id, and flag that says the lesson is private.
Alternatively, there’s a function already present on the site that splits large imports into parts. While it’s not as good as what I’ve developed (it gives me no control over lesson size and naming, and splitting based on word count would split my text mid-sentence) it would be a useful substitution if it worked. Unfortunately, for Japanese it doesn’t work. I’ve just tried importing an entire chapter, and the message I got back said “Imported and split into 1 parts”. Maybe this could be fixed?
At the risk of sounding overly dramatic, this issue is a potential show stopper for me. I’ve been a paying customer of LingQ for about 3 months now, and so far it’s been worth every penny. But if I can’t import my own books (which I have also bought specifically for the purpose of importing them into LingQ) I’ll have to reevaluate whether it’s worth paying for the service, and probably downgrade to a free account until the issue is resolved. Thank you for your understanding.
I will be following up on your request on Monday.
Meanwhile, I have downloaded Calibre, even donated $25 to them on the assumption that this works for me, since reading books on LingQ on my iPad, in various languages, would be great for me. I will let you know how I make out, Still quite confused as to how to get books from Kindle over to Calibre.
I hope your trust in the tool won’t be misplaced. I don’t own a Kindle, so I don’t have first hand experience. But a quick google search suggests that you can hook up your Kindle to your PC via an USB cable, browse it as a folder and copy the books to your disk. From there it should be easy enough to add them to Calibre. However, a lot of ebooks that are sold for Kindle have DRM (Digital Rights Management). Although ultimately any DRM can be broken, it would require some hacking, and Calibre isn’t a hacking too. Assuming the book doesn’t have DRM, Calibre should be able to convert it to pretty much anything.
I was told that getting books in kindle format to your ipad requires a plugin for calibre - and it’s absolutely illegal! Most kindle books have DRM, even those where the publishers have given up on it and use watermarks instead.
If you want to risk it go over to Google and search for “kindle dedrm”. My colleague assures me it’s really very simple.
You can’t transfer the books directly from your kindle though; you’ll need to download them into the kindle app on your computer (there are versions for PC and Mac). That at least works fine, I use it for cookbooks and manuals that really need the big screen of my monitor.
Btw, calibre really is a great tool and makes it so much more easy not to lose track about which books in which series one owns. It does take a bit of getting used to though so be prepared to set up you library once or twice.
If you are also looking for an app to simply read on your iPad, take a look at Marvin. It’s very versatile and harmonizes perfectly with calibre.
Thanks. I guess I will have to invest some time on this. I want to import books into LingQ and then study on my iPad. So I need a format that can be imported into LingQ. That is what I expect Calibre to do once I figure it out. I needn’t transfer from Kindle, I can buy new books and download them to my computer and then try to figure out how to get them into LingQ.
I do it all the time. I learned how to at this website:
I did the same thing you described, developed my own hack using curl and bash and all sorts of steps in between. I have used this to import many ebooks. I haven’t tried the scripts recently, as now auto-splitting of long lesson text is built in since November last year.
I don’t mind if the lesson numbers don’t line up with the chapter numbers. The default lesson text length is maybe too long for my liking. But it works great for even huge texts.
Would you mind trying the scripts now and sharing them with me if they still work? Unfortunately, auto-splitting of lessons doesn’t currently work for Japanese.
I’ll test it later tonight and if it works I’ll let you know and tell you how.
The actual splitting of the ebook text into individual files (e.g. based on chapter headings) I did with Calibre.
Thanks. I can split the files myself, that’s not a problem. All I need is something that can import them.
After just spending a lot of time importing an entire book, i will be watching this thread with great interest.
A fellow member, Fernanda, was kind enough to bring this thread to my attention: https://www.lingq.com/forum/2/30915/?jump_to=1#post-174807
It informs of a bulk import tool developed by the LingQ member spatterson. It’s written in Python and uses Selenium. I wasn’t able to make the script work, and I didn’t try to debug it since I don’t speak Python. But it inspired me to try and copy the Selenium approach in C#. It was in fact very simple. It only took me about 20 minutes to figure out the import bits, and I had never used Selenium before.
More details on the solution:
Selenium is a “browser automation framework”. What it means is that it opens up a browser window, then uses it to navigate to urls, type in text and push buttons. Just like a human would, but it’s automated. My program uses it to open a browser window, go to the LingQ log-in page, put in the username and password and press “Log in”. Then it goes to the import page of a given course, pastes in the lesson title and content, and presses “Save”, over and over again until all your files are imported.
Because this solution uses an actual browser, it will not break down if the LingQ dev team decides to change the site plumbing again (unlike my previous solution). It will break down if they change the user interface in a significant way, but the fix should be simple and straightforward.
It’s much slower than my previous solution. It has to wait for the browser to reload the page every time it makes an import. My previous script didn’t have to do that. The constant refreshing of the browser also takes a toll on the computer’s performance. A direct API call would be far more elegant. Still, it only took a few minutes to upload 169 lessons, and I was free to do my own stuff during that time.
Disadvantages specific to my solution:
Selenium supports a bunch of mainstream programming and scripting languages. Unfortunately, I’m only fluent in the programming ones. That means compiled code, meaning an exe file. Running an exe file on your machine that you got from “some dude on the internet” can be dicey. The script has a clear advantage of being a plain text file that anyone can read and verify that it doesn’t contain any malicious code. Also, anyone with programming skills can easily modify it if the need to.
What I’ll do in the following days:
- I’ll polish up the app I made and release it. It will only handle the import. You’ll have to chop your ebook into files yourself. I might release some scripts for this later if there’s demand.
- I’ll release the source code for the import, so you can do it yourself. But frankly, the code is so simple you can figure it out by yourself if you’re any kind of programmer.
- I’ll try to do a PowerShell version of the solution (using C# snippets for the Selenium calls). It won’t be as user friendly as the app (which has a GUI), but it will be a script file that you can read to see exactly what you’re executing.
- I’ll continue to hope the LingQ guys will add an import function to the API, and make a tool that uses it when they do.
Looking at the thread referenced below, I see there is now an official API call for importing lessons, with a curl example
I didn’t use this approach, I used a manually form post with a copied authorization cookie. All of this makes my scripts very outdated!
Holy mackerel, did I actually overlook that? In my defense, it doesn’t say it’s an import function anywhere in its description. Now that I’m looking at the method, it does seem to facilitate imports. I’ll just go ahead and look on the bright side here: it’s good to know that there are two separate approaches that work, and I learned how to use Selenium, which is nice.