Vision

In Compass, GPT-4o model support vision capability where the model takes images as input and returns response for the questions related to the image. Image input can be given as links or in the base64 encode format as a message.

The supported formats for images include PNG (.png), JPEG (.jpeg, .jpg), WEBP (.webp), and non-animated GIF (.gif).

Sample Request Format

{
    "model": "gpt-4o",
    "stream": false,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                    }
                }
            ]
        }
    ]
}
      }

Sample Response Format

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "The image depicts a serene natural landscape featuring a wooden boardwalk leading into the distance. The boardwalk is surrounded by tall, lush green grass and bushes. The sky above is a striking blue with a few scattered clouds, adding to the calm and peaceful atmosphere. In the far distance, there are some trees lining the horizon. The overall scene suggests a pleasant, sunny day, making it an inviting place for a walk and connecting with nature.",
                "role": "assistant"
            }
        }
    ],
    "created": 1726473827,
    "id": "chatcmpl-A80zzNspPgXMoPVO0TQDgPAtxQq6Y",
    "model": "gpt-4o-2024-05-13",
    "object": "chat.completion",
    "system_fingerprint": "fp_80a1bad4c7",
    "usage": {
        "completion_tokens": 88,
        "prompt_tokens": 1116,
        "total_tokens": 1204
    }
}
    }

Send Images in Base64 Encoded Format

Following is an example request format to send images that are in available in local systems.

import base64
import requests

# OpenAI API Key
api_key = "YOUR_OPENAI_API_KEY"

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {api_key}"
}

payload = {
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json())
}

Send Multiple Images in Single Request

The Chat Completions API can process multiple image inputs in both base64 encoded format or as an image URL address. The models process these images and use them for answering the questions.

Sample Request Format

{
    "model": "gpt-4o",
    "stream": false,
    "messages":[
          {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What are in these images? Is there any difference between them?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://www.gstatic.com/webp/gallery3/1_webp_a.png"
          }
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        },
          {
          "type": "image_url",
          "image_url": {
            "url": "https://www.gstatic.com/webp/gallery/1.webp"
          }
        },
          {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/2/2c/Rotating_earth_%28large%29.gif/300px-Rotating_earth_%28large%29.gif"
          }
        }
      ]
    }
  ]
    }
  }

Sample Request Format

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "Each of these images depicts different scenes and objects. Here are the descriptions and main differences:

1. **First Image**: This is a close-up photograph of an orange-yellow rose with water droplets on its petals.
2. **Second Image**: This is a photograph of a landscape with a boardwalk path going through a grassy field, under a blue sky with clouds.
3. **Third Image**: This shows a mountainous landscape with a river winding through a deep valley, taken from an elevated viewpoint.
4. **Fourth Image**: This is a rotating animation of the Earth in space, highlighting the continents and the oceans as it spins.

**Differences**:
- The first image focuses on a specific plant (a rose), while the second and third images depict broader natural landscapes (a grassy field and a mountainous region, respectively).
- The fourth image is animated and shows the Earth from space, providing a global perspective compared to the localized views of natural environments in the other images.
",
                "role": "assistant"
            }
        }
    ],
    "created": 1726557965,
    "id": "chatcmpl-A8Mt35dJMGV0HIWtgh0FBR003Ho1n",
    "model": "gpt-4o-2024-05-13",
    "object": "chat.completion",
    "system_fingerprint": "fp_80a1bad4c7",
    "usage": {
        "completion_tokens": 199,
        "prompt_tokens": 2060,
        "total_tokens": 2259
    }
}
    }