Chatting with ChatGPT in audio?

JanFinster · January 29, 2023, 11:48am

I wonder if there was a way to implement TTS and STT to interact in audio with ChatGPT in your target language (ie. you use your mic and ChatGPTs answer is read out loud).

Unfortunately, I know nothing about Phyton and programming. So the code that ChatGPT wrote does not help me much. Maybe someone here is more profient and can make it work and then help us Dummies out !?

import speech_recognition as sr
import pyttsx3
import requests

def text_to_speech(text):
engine = pyttsx3.init()
engine.say(text)
engine.runAndWait()

def speech_to_text():
r = sr.Recognizer()
with sr.Microphone() as source:
print(“Speak:”)
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print(“You said: {}”.format(text))
return text
except sr.UnknownValueError:
print(“Google Speech Recognition could not understand audio”)
except sr.RequestError as e:
print(“Could not request results from Google Speech Recognition service; {0}”.format(e))

def chat_with_openai(text):
api_key = “YOUR_OPENAI_API_KEY”
prompt = “YOUR_PROMPT”
model = “text-davinci-002”
completions = requests.post(“https://api.openai.com/v1/engines/{}/completions”.format(model),
headers={
“Content-Type”: “application/json”,
“Authorization”: “Bearer {}”.format(api_key)
},
json={
“prompt”: prompt,
“max_tokens”: 1024,
“n”: 1,
“stop”: None,
“temperature”: 0.5,
}
).json()[“choices”][0][“text”]
return completions

while True:
user_text = speech_to_text()
response = chat_with_openai(user_text)
text_to_speech(response)

(comments from ChatGPT: Note that you’ll need to sign up for a free API key from OpenAI to use this code and replace “YOUR_OPENAI_API_KEY” with your actual API key. Additionally, you’ll need to install the speech_recognition and pyttsx3 packages and have them available in your Python environment.)

S.I · January 29, 2023, 12:07pm

bamboozled · January 29, 2023, 12:36pm

Yes, that’s what I would suggest as well, there seem to be quite a few browser extension out there.
Regarding the API, I don’t think this will give you exactly the same experience, the models are not identical from what I understand. Also you will have to pay depending on how much you use it: Pricing
[Edit: it seems they are planning to provide a Chat GPT API in the future: https://twitter.com/OpenAI/status/1615160228366147585]

JanFinster · January 29, 2023, 12:41pm

Thanks for that suggestion, but this is really not working properly for Chinese. It does not output in Chinese and the mandarin input gets all distorted (it would not recognise less common words and rather substitute them for more common words even though I never said them, e.g. 重要 instead of 肿瘤。。。。 I guess it is too early for this kind of implementation。。。

S.I · January 29, 2023, 1:51pm

I’ve just tried this and a couple of others myself. All of them don’t work properly even for English

bamboozled · January 29, 2023, 3:17pm

A less integrated solution might be to just use what the operating system provides.

Try using the built-in dictation function of your operating system. You would just have to find out how to trigger that and then it should allow you to input into the text field. I would expect the quality to be pretty good, at least on macOS it does a good job at English.
Your operating system or browser should come with a TTS function as well. For example, I have been using the Microsoft Edge browser’s TTS extensively, especially on LingQ (print page). Maybe give that a try. Confusingly they have two different systems:

If you select text, right click it should give you “speech” → “start speaking”. (On my system it uses Apple’s Siri voice)
The other one is called “read page aloud” in the url bar, this uses Microsoft Azure’s “neural” voices, these are some of the best voices on the market imo. Although it works best on a longer text, you can, after activating it, click into the text you want to have read and direct the system.

S.I · January 29, 2023, 4:20pm

Nah, It’s too hard to call it a dialogue
But I did something like that for instant pronunciation check. With the help of AHK script installed, all I have to do now is to click upon any (selectable) word while holding SHIFT, and the system gets me TTS, so getting pronunciation can’t be any faster. Works for Windows.