Google colab notebook to select a language -> course -> lesson and then to run natural language processing from spaCy on the text

Cool! My little experiment worked.

I have worked out how to drill down to select a lesson from amongst my courses on lingq and then to run that lesson through spaCy’s excellent natural language processing.

Here is my Google colab notebook.

And you can see the results here analysing an English text:

And a Polish one:

I have not found another way in Polish to be able to correctly lemmatize text, ie. to find the base form of verbs, nouns and adjectives spaCy is also able to identify how verbs, nouns, adjectives, etc have been inflected - what gender they are or they are agreeing with, what case, whether they are in the singular or plural form.

Now that I have worked out how to run my lingq lessons through spaCy I will be able to chunk the text into sentences or noun phrases and be able to build up a db of sentences that I can then query so I can get all the sentences with an example of a male noun in the nominative case for example. All sentences with plural nominative nouns. The Slavic case system is notoriously difficult and I want to work out a way to mine my lingq lessons for examples of uses of the different cases in material I am familiar with so that I can do some extra work on understanding where cases are used and get familiar with the rather complex way that the endings of adjectives and nouns change depending on case.

2 Likes

I have split off the fetching of all text content from lingq into a separate script which anyone can use: