Audio
The Audio provides two speech-to-text endpoints, transcription, and translation, based on the Whisper and GPT-4o Audio models available with the model ID, gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1, and gpt-4o-audio-preview on the Compass platform.
Speech to Text
The Speech to Text has two endpoints, transcription and translation, and they can be used to:
-
Transcribe the audio into whatever language the audio is in.
-
Translate and transcribe the audio into English.
Whisper model supports a maximum file size is 25MB, and the supported file formats are flac, oga, ogg, mp3, mp4, mpeg, mpga, m4a, wav, and webm.
GPT-4o Audio model supported file formats are wav, mp3, flac, opus, or pcm16, and supported voices are alloy, ash, ballad, coral, echo, sage, and shimmer.
Transcription
The transcription API takes as input the audio file you want to transcribe and the desired output file format for the transcription of the audio. Compass currently support multiple input and output file formats.
from openai import OpenAI
client = OpenAI()
audio_file= open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
print(transcription.text)
By default, the response type will be JSON with the raw text included.
{
"text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger.
....
}
The Audio API also allows you to set additional parameters in a request. For example, if you want to set the response_format as text, your request will look like the following:
from openai import OpenAI
client = OpenAI()
audio_file = open("/path/to/file/speech.mp3", "rb")
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="text"
)
print(transcription.text)
The API Reference includes the full list of available parameters.
Translation
The translation API takes the audio file as input in any of the supported languages and translates it into English.
from openai import OpenAI
client = OpenAI()
audio_file= open("/path/to/file/german.mp3", "rb")
translation = client.audio.translations.create(
model="whisper-1",
file=audio_file
)
print(translation.text)
In the above input request, the audio file was in German, and the output text is in English as below:
Hello, my name is Wolfgang and I come from Germany. Where are you heading today?
Text to Speech
For turning text into speech, use the audio/speech endpoint. Compass offers a GPT-4o mini TTS model compatible with this endpoint, gpt-4o-mini-tts.
With gpt-4o-mini-tts, you can ask the model to speak in a specific manner or with a particular tone of voice.
The supported voices are alloy, echo, fable, onyx, nova, shimmer, coral, verse, ballad, ash, sage, marin, cedar, amuch, aster, brook, clover, dan, elan, marilyn, meadow, jazz, rio, megan-wetherall, jade-hardy, megan-wetherall-2025-03-07, and jade-hardy-2025-03-07.
Sample Request Format
curl --location 'https://{base_url}/v1/audio/speech?api-version=2025-03-01-preview' \
--header 'api-key: {API_KEY}' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-4o-mini-tts",
"input": "The quick brown fox jumped over the lazy dog",
"voice": "alloy",
"stream": true
}'
Response
The response is an audio file.
The supported output formats: mp3, opus, aac, flac, wav, and pcm.