Counting words outside LingQ for later adding

In case it is somehow impossible to use the Import Extension, the work counting of read words gets lost. This way all E-books one might read on a phone or tablet don’t make it to the statistics. Fortunately, it is always possible to copy and paste from any device. Based on that I created the following strategy: I copy the text read and paste it into a .txt file on my laptop. Then run a python script of my own doing (hopefully available below) that counts the words about the same way LingQ does. It disregards numbers unless when naming temperature, decades or ordinals, as I find it useless to count numbers otherwise, say 1984 has different spelling, but it’s rather redundant to count it as word, since the dynamics of number-naming across at least most languages is pretty standardised. It also disregards symbols such as ‘/’, ‘(’, etc., also commas and dots, and will actually erase as much of that as it can. Hyphenisation inside words is kept, as in “Geschwindigkeits-begrenzungsanlage” or “self-centered”.
Naturally, one needs to have Python 3 installed on ones computer in other to make the script work, but it is overall very steady and relies on no third-party code. It displays the words after running and at very end the number of actual words. For convenience, I keep a “file.txt” where text gets copied to, and a “count_file.py” on my Desktop. Then it’s a simple matter of running the script via command line:

cd %userprofile%/desktop
python count_file.py

The script is already configured to read a “file.txt” on the same folder. After the content is pasted onto it and the file saved, it’s done.
Regardless of its use or complete rejection, I thought it might be worth a share.


alph = [‘a’,‘b’,‘c’,‘d’,‘e’,‘f’,‘g’,‘h’,‘i’,‘j’,‘k’,‘l’,‘m’,‘n’,‘o’,‘p’,‘q’,
‘r’,‘s’,‘t’,‘u’,‘v’,‘w’,‘x’,‘y’,‘z’,‘A’,‘B’,‘C’,‘D’,‘E’,‘F’,‘G’,‘H’,‘I’,
‘J’,‘K’,‘L’,‘M’,‘N’,‘O’,‘P’,‘Q’,‘R’,‘S’,‘T’,‘U’,‘V’,‘W’,‘X’,‘Y’,‘Z’,
‘ö’,‘Ö’,‘á’,‘Á’,‘à’,‘À’,‘ä’,‘Ä’,‘ß’,‘ü’,‘Ü’,‘é’,‘É’,‘è’,‘È’,‘í’,‘Í’,‘ì’,‘Ì’,
‘ó’,‘Ó’,‘ò’,‘Ò’,‘ú’,‘Ú’,‘ñ’,‘Ñ’,‘º’,‘ª’]

filename = ‘file’ #input('Insert filename: ')
filename += ‘.txt’

txt = []

def removeFirst(word):
if (len(word)<2)&(word not in alph):
return ‘’
if word[0] not in alph:
word = removeFirst(word[1:])
return word

def removeLast(word):
if (len(word)<2)&(word not in alph):
return ‘’
if word[len(word)-1] not in alph:
word = removeLast(word[:len(word)-1])
return word

raw = open(filename,encoding=‘utf-8’)
content = raw.read()
raw.close()

temp = content.split(’ ')
new_temp = []

cont = 0
while (cont < len(temp)):
copy = temp[cont]
new_temp.append(copy.split(‘\n’))
cont = cont + 1

for i in new_temp:
for j in i:
aux = removeLast(j)
try:
int(aux)
except:
try:
int(aux[len(aux)-1])
except:
txt.append(removeFirst(aux))

#some content gets lost, e.g. ‘/’ turns to ’ ‘, which is still counted, therefore this step counts the latter
void = 0
for i in txt:
if i==’':
void += 1

print(txt) #Comment with # for not displaying words
print()
print(len(txt)-void)

PS. The code can as usual be adapted to Linux flavours and MAC OS. It could also be made to run on mobiles.