New words count with numbers, URLs and incomplete word forms

One thing that I don’t like in transcripts:
If incompletely pronounced words are written, followed by the completely pronounced form, just for accuracy’s sake. These incomplete forms are not ‘words’ but still get counted by the system as new words. I would appreciate the possibility of un-counting (counting out?) such forms. The same is true for numbers (figures) and URLs which also are counted as ‘new words’.
This may not be possible and doesn’t bother me too much, because transcripts do contain words and even sentences in other languages sometimes and this doesn’t affect the overall statistics too much. I was just wondering because I’ve noticed this frequently in RussianLingQ podcasts.

Hi Reinhard,

As you say, there will be non-words here and there that get counted in your statistics but the overall statistics are more what you should be focused on. If you would prefer the transcriptions done a little differently, why don’t you suggest that on the RussianLingQ Forum. I will make sure that Anna sees it there and I’m sure she will be happy to adjust her transcripts. Also, feel free to give her any other feedback on her podcasts especially regarding future subjects of conversation.

Perhaps the transcripts could be shown twice? Once colloquial and then if appropriate, as a postscript, with the standard writing, numbers and URLs? (I know that Berta has done something like that with the spoken word: first slow and then recorded at normal speed.) The above may not solve the problem of extra words being counted, but it would ease the flow of text.

Hi Sanne,

We wouldn’t have two transcripts since that would make the word counts really skewed. I do agree with Reinhard that partial words should be avoided or skipped by transcribers.

Hi All, I just tried to make it easier for a listener. Remember how it was with me, when I leaned English. If some of words are not completed, but a listener can hear them he/she could be confused. What is your suggest? I am happy to adjust my transcriptions and it’s even easier for me to skip uncompleted words.

I transcribed some my own podcasts, and also wrote all incomplete words. Although I stroke them through, they are shown as normal text, not as deleted text :((

It’s not a big issue for me. This was not meant as criticism of transcribers. It’s just that when you update, everything is counted, you can’t update parts of a text (except for marking blue words as ‘known’). I wouldn’t consider importing the text again myself and making any corrections or deletions before updating. It’s just not worth it. I would do that with texts that are half English half L2, though, because 50% of the words could seriously distort the statistics (for beginners).
Thank you Anna and Rasana for your good work. I am not usually confused by uncompleted words, although I have tried to find them in online dictionaries occasionally :slight_smile:

Reinhard, perhaps it is better to write whole words, but place them into brackets…
So, it would be not “Ну, до сви до следующего раза”, but “Ну, (до свидания) до следующего раза”

Could it be useful for you if I mark uncompleted words by blue colour? Also, if you have any trouble with my podcasts, which you can’t solve easy online - feel free to ask me.

I think it is best not to include partial words at all. Treat them like umms and ahhs. Just ignore them. That way they don’t affect the statistics and people don’t try to look them up. Anyone studying these podcasts will be at a level where they can recognize that these weren’t complete words. Plus, it saves time for the transcriber! :slight_smile: