I created a Python script that uses whisper AI to transcribe audio, for those that are hitting limits

Attached is the script, you have to install openai-whisper (pip install openai-whisper) and ffmpeg (which will need a path so it can find the exe). This will transcribe ever mp3 file in a directory and generate a .txt file with the same name. The first time you run with a new model, it will download that model. The medium model is about 1.5GB. This can take a while to run. I’m not a python expert, I’m sure others here are far better but hopefully some can find this useful. Sorry, you’ll have to reformat the text a bit because the lingq forums do all sorts of nasty auto-intendation that you can’t seem to undo.

import whisper
import subprocess
import math
import os
import time

#tiny (39M parameters)
#base (74M parameters)
#small (244M parameters)
#medium (769M parameters)
#large (1550M parameters)
#large-v2 (an updated version of large)
#large-v3 (the latest version)

#tiny model is very fast, good for testing, but results aren’t great
#large-v3 is good but slow, 1 minute of audio input = ~1-2 minutes to process
#depending on hardware
model = whisper.load_model(“medium”)

#directory to process
directory = “bookworm2” # Replace with the path to your folder

#loop through all .txt files in the directory
for filename in os.listdir(directory):
if filename.endswith(“.mp3”):
input_path = os.path.join(directory, filename)

    starttime = time.time()       
    result = model.transcribe(input_path, language = "Japanese")
  
    base, ext = os.path.splitext(filename)
    new_filename = f"{base}.txt"
    output_path = os.path.join(directory,new_filename )  

    punctuated_content =  ''.join(result["text"])

    #Save the new file
    with open(output_path, 'w', encoding='utf-8') as file:
        file.write(punctuated_content)

    endtime = time.time()     
    print(f"\n{new_filename} transcribe time : {endtime - starttime}s")
2 Likes