Document AI
Document AI-powered and integrated with OCR enables automating the document operations for true scale and intelligence.
Document AI helps extract and analyze the data, and enhance enterprise search out of scanned documents.
The Document AI with the OCR processor returns the extracted text, image boxes, and metadata about the document structure, making it easy to programmatically work with the recognized content.
Compass offers this capability with the Mistral Document AI-25.05 model.
Document AI Usage Example
The following is an example showing the request and response sample format for using the OCR technology with the Document AI model.
For Images
Sample Request Format (Azure OpenAI)
curl --location 'https://api.core42.ai/openai/deployments/mistral-document-ai-2505/ocr'
--header 'Content-Type: application/json'
--header "api-key: {API_KEY}"
--data '{
"document": {
"type": "image_url",
"image_url": ""
}
}'
Sample Request Format (OpenAI)
curl --location 'https://api.core42.ai/v1/ocr'
--header 'Content-Type: application/json'
--header "api-key: {API_KEY}"
--data '{
"model": "mistral-document-ai-2505",
"document": {
"type": "image_url",
"image_url": ""
}
}'
Sample Response Format
{
"content_filter_results" : null,
"document_annotation" : null,
"model" : "mistral-document-ai-2505",
"pages" : [
{
"dimensions" : {
"dpi" : 200,
"height" : 225,
"width" : 225
},
"images" : [],
"index" : 0,
"markdown" : "(a)"
}
],
"usage_info" : {
"doc_size_bytes" : 2156,
"pages_processed" : 1,
"pages_processed_annotation" : 0
}
}
For Documents
Sample Request Format
curl -X POST "https://api.core42.ai/v1/ocr" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AZURE_API_KEY" \
-d '{
"model": "mistral-document-ai-2505",
"document": {
"type": "document_url",
"document_url": "data:application/pdf;base64,<content_of_base64_string>"
},
"document_annotation_format": {
"type": "json_schema",
"json_schema": {
"schema": {
properties": {
"language": {"title": "language", "description": "What language?", "type": "string"},
"chapter_titles": {"title": "chapter_titles", "description": "Chapter Titles", "type": "string"},
"urls": {"title": "urls", "description": "URLs", "type": "string"}
},
"required": ["language", "chapter_titles", "urls"],
"title": "DDOCAnnotation",
"type": "object",
"additionalProperties": false
},
"name": "document_annotation",
"strict": true
}
},
"include_image_base64": true
}'
Sample Response Format
{
"pages": [
{
"index": 0,
"images": [],
"markdown": "This is a sample one-page PDF created using Python and ReportLab. It demonstrates how to generate a simple document with text content. You can modify this text to include any message or paragraph you want.",
"dimensions": {
"dpi": 200,
"height": 2200,
"width": 1700
}
}
],
"model": "mistral-document-ai-2505",
"document_annotation": "{\n \"language\": \"English\",\n \"chapter_titles\": \"Sample One-Page PDF\",\n \"urls\": \"https://example.com/sample-pdf\"\n}",
"usage_info": {
"pages_processed": 0,
"doc_size_bytes": 1618,
"pages_processed_annotation": 1
},
}