Transcription
Converts the given audio file to text.
Create Transcription
Creates a transcription for the audio file.
Azure OpenAI
Request
POST https://api.core42.ai/openai/deployments/whisper/audio/transcriptions
OpenAI
Request
POST https://api.core42.ai/v1/audio/transcriptions
Request Parameters
Name | Required | Type | Description |
---|---|---|---|
file | true | file | Audio file object to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, or webm. |
model / deployment-id | true | string | Model ID to use for the request. Only whisper-1 is available. To transcribe a file larger than 25 MB, break it into chunks. Alternatively, you can use the Azure AI Speech batch transcription API if deploying on Azure openAI. |
language | false | string | Language of the audio file. An input language in ISO-639-1 format will improve accuracy and latency. The supported languages are: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh. |
prompt | false | string | Optional text to guide the model's style or continue a previous audio segment. Ensure the prompt matches the audio language. |
response_format | false | string | Format of the transcript output. Supported output formats: json, text, srt, verbose_json, and vtt. |
temperature | false | number | The temperature controls randomness. The range is from 0 to 2. Lowering results in less random completion. As the temperature approaches zero, the model will become deterministic and repetitive. If the value approaches 0, the model tends to produce more predictable and deterministic responses. This means the generated text is more likely to adhere closely to the input prompt and follow a coherent narrative. If the value approaches to 2, it produces more randomness, resulting in responses that are less predictable and more diverse. This may lead to more creative outputs, reducing the coherence and relevance of the input. |
timestamp_granularities | false | array | The timestamp granularities to populate for this transcription. Note: response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency. |