Embeddings

An embedding is a vector of floating-point numbers. The distance between two vectors measures their relatedness. A small distance suggests high relatedness whereas a large distances suggest low relatedness.

Compass provides access to the embedding model, Embeddings 3 Large.

Embedding is used for:

Clustering: text strings are grouped by their similarity.
Search: results are ranked based on the relevancy of the query strings.
Classification: text strings are classified based on similarity with the label.
Recommendations: items with related text strings are recommended.
Anomaly detection: outliers with little relatedness are identified.
Diversity measurement: analyzes similarity distributions.

To get an embedding, send a text string to embedding models. Learn more about embedding API endpoints by referring to our API reference documentation.

The response contains an embedding with a list of floating point numbers you can extract, save in the vector database, and use for various use cases. Some of them are described above.

{
  "input": "hi",
  "model": "text-embedding-3-large"
}

The response will contain an embedding vector with some additional metadata.

{
    "object": "list",
    "data": [
        {
            "object": "embedding",
            "index": 0,
            "embedding": [
                -0.006654263,
                0.0054927324,
                -0.0018895327,
                 ....(3072 floats total for text-embedding-3-large)
            ]
        }
    ],
    "model": "text-embedding-3-large",
    "usage": {
        "prompt_tokens": 1,
        "total_tokens": 1
    }
}

By default, the length of the embedding vector is 3072 for text-embedding-3-large. Users can reduce the dimensions by passing the dimension parameter without the embedding losing its concept representing properties.