Skip to content
  • There are no suggestions because the search field is empty.

Document AI API Reference

When given the PDF document and image input, the models return a response with extracted text.

Create an OCR

Extracts document information for the given document and image input.

Request

POST https://api.core42.ai/v1/ocr

Request Parameters

 

Parameter

Required

Type

Description

model

Yes

string

Model ID to use for extracting document information. Supported model is mistral-document-ai-2505.

type

False

string

The link type. The type value can be “document_url” or “image_url”.

document_url

Yes

string

The URL of the document in base64 format.

image_url

Yes

string

The URL of the image in base64 format.

bbox_annotation_format

 

False

null

Specify the format that the model must output. By default, it will use \{ "type": "text" \}. Setting to \{ "type": "json_object" \} enables JSON mode, ensuring the message the model generates is in JSON format. When using JSON mode, you MUST also instruct the model to produce JSON itself via a system or user message.

Setting to \{ "type": "json_schema" \} enables JSON schema mode, which guarantees the message the model generates is in JSON and follows the schema you provide.

Sample Format:

   "bbox_annotation_format": {        "type": "json_schema",        "json_schema": {            "schema": {                "properties": {                    "document_type": {"title": "Document_Type", "type": "string"},                    "short_description": {"title": "Short_Description", "type": "string"},                    "summary": {"title": "Summary",  "type": "string"}                },                "required": ["document_type", "short_description", "summary"],                "title": "bboxAnnotation",                "type": "object",                "additionalProperties": false            },            "name": "bbox_annotation",            "strict": true        }    },

document_annotation_format

False

null

Specify the format that the model must output. By default, it will use \{ "type": "text" \}. Setting to \{ "type": "json_object" \} enables JSON mode, ensuring the message the model generates is in JSON format. When using JSON mode, you MUST also instruct the model to produce JSON itself via a system or user message.

Setting to \{ "type": "json_schema" \} enables JSON schema mode, which guarantees the message the model generates is in JSON and follows the schema you provide.

Sample Format:
"document_annotation_format": {        "type": "json_schema",        "json_schema": {            "schema": {                "properties": {                    "language": {"title": "language", "type": "string"},                    "chapter_titles": {"title": "chapter_titles", "type": "string"},                    "urls": {"title": "urls",  "type": "string"}                },                "required": ["language", "chapter_titles", "urls"],                "title": "DOCAnnotation",                "type": "object",                "additionalProperties": false            },            "name": "document_annotation",            "strict": true        }    },

document_type

Yes

string

The type of the document.

short_description

Yes

string

Details of the image in English.

summary

Yes

string

Summary of the document.

language

Yes

string

Language in which the document is written.

chapter_titles

Yes

string

Document title or chapter titles of the document.

urls

Yes

string

Document URL or image URL.

include_image_base64

False

boolean, null

Includes image URLs in the response.

 

Document AI Usage Example

The following is an example showing the request and response sample format for using the OCR technology with the Document AI model.

For Images

Sample Request Format (Azure OpenAI)

 curl --location 'https://api.core42.ai/openai/deployments/mistral-document-ai-2505/ocr' 
--header 'Content-Type: application/json' 
--header "api-key: {API_KEY}" 
--data '{
      "document": {
       "type": "image_url",
       "image_url": ""
      }
    }' -v 

Sample Request Format (OpenAI)

curl --location 'https://api.core42.ai/v1/ocr'
--header 'Content-Type: application/json' 
--header "api-key: {API_KEY}" 
--data '{
      "model": "mistral-document-ai-2505",
      "document": {
       "type": "image_url",
       "image_url": ""
      }
    }'

Sample Response Format

{
   "content_filter_results" : null,
   "document_annotation" : null,
   "model" : "mistral-document-ai-2505",
   "pages" : [
      {
         "dimensions" : {
            "dpi" : 200,
            "height" : 225,
            "width" : 225
         },
         "images" : [],
         "index" : 0,
         "markdown" : "(a)"
      }
   ],
   "usage_info" : {
      "doc_size_bytes" : 2156,
      "pages_processed" : 1,
      "pages_processed_annotation" : 0
   }
}

For Document

Sample Request Format

curl -X POST "https://api.core42.ai/v1/ocr" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_API_KEY" \
  -d '{
     "model": "mistral-document-ai-2505",
     "document": {
      "type": "document_url",
      "document_url": "data:application/pdf;base64,<content_of_base64_string>"
     },
     "document_annotation_format": {
       "type": "json_schema",
       "json_schema": {
           "schema": {
               properties": {
                    "language": {"title": "language", "description": "What language?", "type": "string"},
                    "chapter_titles": {"title": "chapter_titles", "description": "Chapter Titles", "type": "string"},
                    "urls": {"title": "urls", "description": "URLs", "type": "string"}
                },
                "required": ["language", "chapter_titles", "urls"],
                "title": "DDOCAnnotation",
                "type": "object",
                "additionalProperties": false
            },
            "name": "document_annotation",
           "strict": true
       }
     },
     "include_image_base64": true
   }'

Sample Response Format

{
    "pages": [
        {
            "index": 0,
            "images": [],
            "markdown": "This is a sample one-page PDF created using Python and ReportLab. It demonstrates how to generate a simple document with text content. You can modify this text to include any message or paragraph you want.",
            "dimensions": {
                "dpi": 200,
                "height": 2200,
                "width": 1700
            }
        }
    ],
    "model": "mistral-document-ai-2505",
    "document_annotation": "{\n  \"language\": \"English\",\n  \"chapter_titles\": \"Sample One-Page PDF\",\n  \"urls\": \"https://example.com/sample-pdf\"\n}",
    "usage_info": {
        "pages_processed": 0,
        "doc_size_bytes": 1618,
        "pages_processed_annotation": 1
    },
}