Update - New Whisper Automatic Transcription, Deepl integration and more

Ooooooohhhh this gonna be good!!! 8D

Have currently uploaded a podcast I really enjoy that I’ve listened to without a transcript a few times, will be interesting to see what if anything I missed along the way.

UPDATE: I see it’s taking a while to generate the transcript and already puked a memory error so I guess we’re hugging it to death, lol. I figure I’ll let it burble back there and see what shows up in the next day or so.

1 Like

I haven’t been able to get it to work on my Mac, converting files to mp3 isn’t working. I’m not sure how to do it.

1 Like

If you have a rando podcast you can just save the MP3 and try to load 'er up. But again, I’m getting a bunch of errors so… I’m going to let it percolate overnight on the assumption that the whisper server(s) is/are overloaded, bugs need to be combed out, etc. If it’s still busted in the morning I’ll delete that attempt and try again.

I’m fine with beta testing given the possibilities once the wrinkles are ironed out!

1 Like

The current New Word percentage is affected by the length of a text, because word frequency is non-linear (Zipf’s Law). This is because the current metric is Unique New Words / Total Unique Words * 100. If word frequency were linear, then it wouldn’t matter.

To give you an example of how the length of the text affects the current New Word metric, just look at my statistics for The Adventures of Pinocchio. It’s the same across all long texts though.

The ~40k text of The Adventures of Pinocchio has 36% New Words (with the current metric), but individual chapters of ~1.1k on average words have between 12-21% New Words (with the current metric). The % of New Words never gets anywhere near 36%.

So imagine if I import a short text of say 1.1k words and then a novella of 40k words. How am I meant to evaluate the difficulty if one shows me the New Word % is 17%, while the next one is 36%. Because in reality, the 36% New Word novella is not challenging.

When the length of lessons were held semi-constant (max ~2k words), this was not much of a problem. But with text lengths now varying drammatically (up to entire books of 100k words or more), this will make the New Word metric not very useful.

Consider the metric (to one decimal place): Unique New Words / total words * 100

2 Likes

@MarkE: you can use an online converter, for example https://cloudconvert.com

1 Like

mark,

is it possible to add an option of “add it to playlist” at the end of each lesson. I swear I saw it a while ago and added audio files this way but somehow it was no longer available. I do not know what happened to it. I thought it was time efficient for me to add files into my playlist on the go as I finished reading a lesson.

1 Like

If you have iTunes installed, you can download the Podcasts there. The MP3 files end up in the iTunes->iTunes Media->Podcasts folder.

2 Likes

I tried to import a movie in Farsi that has no subtitles from YouTube. I got an error message. How can I import the movie and create an automatic transcription for it?

1 Like

We have so much information at the end of the lesson already that this is unlikely to be added back. All completed lessons are automatically added to your Active Playlist. Otherwise, just close the lesson complete popup, and go to Lesson Info in the lesson menu. From there you can add the lesson to any playlist.

1 Like

Sorry! Some resource issues to resolve. We are working on it.

1 Like

The “large” model is used.

1 Like

I suppose you need to convert the YT video in audio only. There are tools online. And then you upload the audio if below 90’.

1 Like

I agree with nfera, my 3rd Harry Potter book (El prísionero de Azkaban) shows 28% for the whole book, but the largest percentage in any individual chapter is 14%.

Not a biggie for me, just a curiosity.

1 Like

OK, the podcast I tried to upload yesterday was still completely borked when I looked at it so I went ahead and deleted it and am now trying again. Fingers crossed!

1 Like

IT WORKED!!! This is gonna be AMAAAAAZING!

2 Likes

On Whisper OpenAI these are the supported languages and Serbian is included (not sure if there are other limitations):
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

1 Like

You’ve changed from Google Translate to Deepl? Deepl doesnt do Malay! Arrgh! No, they are not the same. :frowning:

Edit: Only just started, immediate error. Lari bersungguh-sungguh is a Malay expression, meaning hard-working and diligent. Deepl translates it literally into “ran in earnest”. Not the same. :frowning:

1 Like

Thanks, especially to DeepL.

1 Like

It is exactly what I am doing now and it is exactly what I do not want to have to do anymore!

1 Like
  • Known Words targets per level adjusted to account for language variances better.

Is there any change? I think at least intermidiate 2 is unchanged? Or just the change isn’t roll out?

1 Like