Document AI API Reference
When given the PDF document and image input, the models return a response with extracted text.
Create an OCR
Extracts document information for the given document and image input.
Request
POST https://api.core42.ai/v1/ocr
Request Parameters
|
Parameter |
Required |
Type |
Description |
|---|---|---|---|
|
model |
Yes |
string |
Model ID to use for extracting document information. Supported model is mistral-document-ai-2505. |
|
type |
False |
string |
The link type. The type value can be “document_url” or “image_url”. |
|
document_url |
Yes |
string |
The URL of the document in base64 format. |
|
image_url |
Yes |
string |
The URL of the image in base64 format. |
|
bbox_annotation_format
|
False |
null |
Specify the format that the model must output. By default, it will use Setting to Sample Format: |
|
document_annotation_format |
False |
null |
Specify the format that the model must output. By default, it will use Setting to Sample Format: |
|
document_type |
Yes |
string |
The type of the document. |
|
short_description |
Yes |
string |
Details of the image in English. |
|
summary |
Yes |
string |
Summary of the document. |
|
language |
Yes |
string |
Language in which the document is written. |
|
chapter_titles |
Yes |
string |
Document title or chapter titles of the document. |
|
urls |
Yes |
string |
Document URL or image URL. |
|
include_image_base64 |
False |
boolean, null |
Includes image URLs in the response. |
Document AI Usage Example
The following is an example showing the request and response sample format for using the OCR technology with the Document AI model.
For Images
Sample Request Format (Azure OpenAI)
curl --location 'https://api.core42.ai/openai/deployments/mistral-document-ai-2505/ocr'
--header 'Content-Type: application/json'
--header "api-key: {API_KEY}"
--data '{
"document": {
"type": "image_url",
"image_url": ""
}
}' -v
Sample Request Format (OpenAI)
curl --location 'https://api.core42.ai/v1/ocr'
--header 'Content-Type: application/json'
--header "api-key: {API_KEY}"
--data '{
"model": "mistral-document-ai-2505",
"document": {
"type": "image_url",
"image_url": ""
}
}'
Sample Response Format
{
"content_filter_results" : null,
"document_annotation" : null,
"model" : "mistral-document-ai-2505",
"pages" : [
{
"dimensions" : {
"dpi" : 200,
"height" : 225,
"width" : 225
},
"images" : [],
"index" : 0,
"markdown" : "(a)"
}
],
"usage_info" : {
"doc_size_bytes" : 2156,
"pages_processed" : 1,
"pages_processed_annotation" : 0
}
}
For Document
Sample Request Format
curl -X POST "https://api.core42.ai/v1/ocr" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AZURE_API_KEY" \
-d '{
"model": "mistral-document-ai-2505",
"document": {
"type": "document_url",
"document_url": "data:application/pdf;base64,<content_of_base64_string>"
},
"document_annotation_format": {
"type": "json_schema",
"json_schema": {
"schema": {
properties": {
"language": {"title": "language", "description": "What language?", "type": "string"},
"chapter_titles": {"title": "chapter_titles", "description": "Chapter Titles", "type": "string"},
"urls": {"title": "urls", "description": "URLs", "type": "string"}
},
"required": ["language", "chapter_titles", "urls"],
"title": "DDOCAnnotation",
"type": "object",
"additionalProperties": false
},
"name": "document_annotation",
"strict": true
}
},
"include_image_base64": true
}'
Sample Response Format
{
"pages": [
{
"index": 0,
"images": [],
"markdown": "This is a sample one-page PDF created using Python and ReportLab. It demonstrates how to generate a simple document with text content. You can modify this text to include any message or paragraph you want.",
"dimensions": {
"dpi": 200,
"height": 2200,
"width": 1700
}
}
],
"model": "mistral-document-ai-2505",
"document_annotation": "{\n \"language\": \"English\",\n \"chapter_titles\": \"Sample One-Page PDF\",\n \"urls\": \"https://example.com/sample-pdf\"\n}",
"usage_info": {
"pages_processed": 0,
"doc_size_bytes": 1618,
"pages_processed_annotation": 1
},
}