I can sort of get why for the part where it’s just gameplay video with no words (though showing “no” over and over is kind of odd), but the final part of the video does have a conclusion section, and it’s not transcribed in the input part.
I noticed that when Whisper transcribes the audio, for audio parts where no one is speaking, sometimes it repeats words like this. One time it started adding to the storyline, in context with the transcript, which was weird. Another time, it said merry christmas, and completely out of context with the rest of the transcript. I was completely surprised and had a good LOL. I just go into the Edit Lesson and delete those lines or leave it as is and simply ignore it.
It is really important, I’m waiting 1 hour for a 10 minute video some days, and then the output has so many words that are made up (not in the dictionary, not on google, etc.). We need to be able to use the old method for faster learning, I know it’s the trend to implement AI, but it should be to enhance the experience for users, I feel it is creating obstacles instead.
Replying again, also this is an issue, I noticed when I reply to people, even if I see their name above the chatbox, it doesn’t show that I replied to them when I hit “reply”. Though this issue is not a huge one, the bigger issue is the import time/quality. The old one, even if it’s the same thing (made up words) at least we don’t need to plan ahead of time. We can just surf completely naturally on youtube and import whatever catches our eye that very second.