Copy Text from PDF with no Broken-Up Lines, "Stripmail"

Just wanted to share this trick with you: if you have ever tried to copy text from a PDF document and paste it into the text field of a new LingQ’s lesson, you will have surely noticed that lines are broken up, as if one were reading poetry instead of prose.

I’ve found a simple program (1kb in size) that solves this problem. It is called “StripMail”. I recommend reading the following step-by-step guide, not only it guides you through the installation process, but also shows how to assign a keyboard shortcut to a script that launches StripMail in the background, does the text-formatting work, and closes it, all in a matter of microseconds.

http://goo.gl/gP2ML

IMPORTANT: I’m using this program on a Windows Seven 64-bit computer. The files “stripmail.exe” and “stripmail.bat” [the above guide shows you how to create the latter] must be located in “C:/Program Files/StripMail/”, NOT “C:/Program Files (x86)/StripMail/” in order for everything to work properly.


TEXT COPIED FROM A PDF FILE

Hans-Christian Ströbele (Bündnis
90/Die Grünen) ist Mitglied des
Deutschen Bundestags. Er hat
2009 für seine Partei das einzige
Direktmandat gewonnen.

SAME TEXT PROCESSED BY STRIPMAIL

Hans-Christian Ströbele (Bündnis 90/Die Grünen) ist Mitglied des Deutschen Bundestags. Er hat 2009 für seine Partei das einzige Direktmandat gewonnen.

If there was a way to do it without breaking the formatting… (I prefer to see numbers of articles in legal texts in bold to distinguish different articles easily)

Eugene, there seems to be no way, either automated or manual, to keep text in bold or italics.

I use ABBYY FineReader to convert a PDF (including files that are already text-based and need no OCR) into DOC. Afterwards the formatted text from the DOC file can be easily copied into LingQ, but there are lots of line breaks then.

@hape Don’t you know a way that wouldn’t ruin formating?
I have no line breaks in a Word document, but they appear as I import the text from it into LingQ :frowning:

This is a problem of Firefox that I’ve reported about 1 months ago. Chrome and Internet Explorer are working fine.

@VeraI I used Chrome to do my last import of the text copied from Word and still: misplaced line breaks all over. As of FireFox, it doesn’t even save the formatting.