Double counting known words - Annoying bug

I am reviewing a new lesson in LingQ. It highlights blue already learnt words. This is super annoying. Even simple ones it will re-highlight and show as unknown words. I could have been saving the same word twice. You can see [SOME EXAMPLES] in the links below:

I have circled red those that I definitely recall I have covered once upon a time. In addition, you can see those that are double counting in the 2 images. Please fix this as soon as possible. Thanks.

I have never experienced this. It would be interesting to hear if anyone else is experiencing this. It might be specific to something that you are doing.

  1. This only happens with words with accent marks on them.
  2. I tried by importing directly from source PDF in Mac, and it shows 100+ new words.
  3. I re-imported the source PDF and pasted into Notepad in Windows 7 [Vmware], and pasted it back to my firefox browser [Mac]. it shows 60+ new words
  4. I re-imported the source PDF and pasted into Notepad in Windows 7 [Vmware], and pasted it into IE 9 [Vmware] and it shows 38 new words.
  5. I tried re-importing the source PDF and pasted into Google Translator and then back into LingQ [Mac]. Still shows double counting.

Out of the 38 new words, I have learnt already 15 of them. So it is inaccurate.

I can think that your system is not reformatting the input characters into a standard encoding type, simply taking them in whatever encoding I import the text in.

In my observation, pasting in notepad should have stripped the original formatting associated with the text.

Okay - I worked out something for you. I copied “é” from another word of this site. Replaced the é in the word téléphone which was incorrectly highlighted blue again. Updated the text and viewed it. It now shows it as learnt.

Hence, you need to somehow get your coder to re-format the accented characters into 1 encoded type.

I don’t quite understand points 2-5 in your examples, but I think I understand the gist of the issue. We have found that certain PDFs use unique formatting and can cause problems, so it’s best to run the text through a word processing program first before importing it into LingQ. In the meantime, can you email me the PDF that you’re using so I can try to better understand what might be causing the issue?

Just sent. It’s 15meg - Assimil Business French.

You may test with any accented character - use Mac and Windows. Thanks.

The PDF file you sent displays properly in Preview and Adobe Reader in OSX 10.6.8, but when I try to copy and paste some of the text to any other program it just shows little boxes (􀀊􀀓􀀒􀀏􀀐􀀌􀀒􀀒). How were you able to get the text to copy & paste properly from the source PDF?

Yeah, Adobe does that for me too. Gotta use Preview in Mac to copy and paste.

In Preview I get the following: “␣␣␣␣␣␣␣␣␣ ␣␣␣␣␣␣␣”

:frowning: Works for me in Lion.

Copying and pasting text from PDF files sometimes works perfectly and sometimes doesn’t work at all – it entirely depends on how the file itself is formatted. In this case, it sounds like the issues stem from the original format of the PDF file, which apparently requires a unique setup to even be able to copy and paste the contents of the file.

In this case, you’re best off doing trial and error to see which program works the best to re-encode the text and format it properly. There’s only so much we can do to make this work, and if Adobe Reader can’t handle it properly then we’re not likely to be able to either.