Audio

The maximum supported file size is 25MB and the supported file formats are mp3, mp4, mpeg, mpga, m4a, wav, and webm.

Transcription

The transcription API takes as input the audio file you want to transcribe and the desired output file format for the transcription of the audio. Compass currently support multiple input and output file formats.

from openai import OpenAI
    client = OpenAI()
    
    audio_file= open("/path/to/file/audio.mp3", "rb")
    transcription = client.audio.transcriptions.create(
      model="whisper-1", 
      file=audio_file
    )
    print(transcription.text)

By default, the response type will be JSON with the raw text included.

{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger.
....
}

The Audio API also allows you to set additional parameters in a request. For example, if you want to set the response_format as text, your request will look like the following:

from openai import OpenAI
client = OpenAI()

audio_file = open("/path/to/file/speech.mp3", "rb")
transcription = client.audio.transcriptions.create(
  model="whisper-1", 
  file=audio_file, 
  response_format="text"
)
print(transcription.text)

The API Reference includes the full list of available parameters.

Translation

The translation API takes the audio file as input in any of the supported languages and translates into English.

from openai import OpenAI
client = OpenAI()

audio_file= open("/path/to/file/german.mp3", "rb")
translation = client.audio.translations.create(
  model="whisper-1", 
  file=audio_file
)
print(translation.text)

In the above input request, the audio file was in German and the output text is in English as below:

Hello, my name is Wolfgang and I come from Germany. Where are you heading today?