Compass API (New)
What do I need to connect to Compass API?
You need API key and an active subscription to connect to Compass APIs. Explore the AI models from our Products page and then reach out to the Compass team to subscribe to the AI models. After activating the subscription plan, you can connect to the APIs.
You can also create a private endpoint to connect to Compass APIs, but this requires you to have an Azure account with active subscription.
To get a private endpoint, please write to compass.support@core42.ai, and the support team will help you create an endpoint.
What is the Jais API?
Jais is an auto-regressive bilingual Large Language Model (LLM) for Arabic and English languages that generates text from a given input using a transformer-based neural network. The model is trained on a mixture of Arabic and English texts, including source code in various programming languages. The model available in Azure is Jais 30B Chat.
What are the benefits of Jais?
-
Communication: Enhance interactions, modernizing operations through streamlined processes and bilingual capabilities. Facilitate smooth, and accurate responses to inquiries and requests in Arabic and English.
-
Customer Experience: Enrich the customer experience with Jais, bringing automated interactions served in local languages that save time and enhance efficiency.
-
Productivity: Automate manual processes, ensuring customers have quick and easy access to essential services.
-
Tim-to-Market: Grow your customers and increase your reach with relevant and accurate content generated across Arabic and English-speaking audiences. Using Jais, marketing teams can automate manual tasks, leaving more time to focus on creativity and storytelling.
What are the file formats supported by Whisper?
The supported file formats are:
-
m4a
-
mp3
-
mp4
-
mpeg
-
mpga
What is the maximum file size allowed in Whisper?
The maximum file size allowed is 25 MB.
What is a Successful Call?
Successful Call refer to the number of API calls in Compass that were executed successfully, resulting in a response without any errors. This shows that the API request was properly processed and responded to as expected.
What is a Blocked Call?
Blocked Call in Compass are those API calls that were not processed due to predefined conditions such as failed API key validations or reaching rate limits. These calls are blocked by Compass's gateway and will not reach the actual API backend. Avoid blocked calls by ensuring valid authentication credentials and adhering to rate limits.
What is a Failed Call?
Failed Call in the Compass API are calls that were accepted for processing but encountered execution errors. These often result in server errors (like 5XX HTTP status codes), indicating issues with the API or its backend service. Regularly monitor and debug your API to identify and resolve backend issues causing these failures.
What is the Other Call?
Other Call in the Compass API context include calls that do not fit into successful, blocked, or failed categories. These might be calls resulting in client-side errors (4XX HTTP status codes) or those that were redirected. Check for client-side issues like incorrect request formats or unauthorized access attempts.
What is a Total Call?
Total Calls in Compass represents the aggregate number of all API calls made, encompassing successful, blocked, failed, and other calls. It gives a complete picture of the API's usage over a certain period. Use this metric for an overall understanding of API usage and to identify trends or spikes in API traffic.
What is Response Time?
Response Time in the Compass API measures how long it takes for the API to process a request and return a response. It is crucial for evaluating the API's performance, with shorter response times generally indicating better efficiency. Optimize API performance by streamlining backend processes and using efficient coding practices.
What is the Bandwidth?
Bandwidth for the Compass API refers to the volume of data transferred during API interactions over a network, typically measured in data volume per time unit. It's important for assessing the data handling and transfer capabilities of the API. Monitor bandwidth to manage API data transfer efficiency and to plan for scaling if necessary.
What is Semantic Kernel?
Semantic Kernel is an open-source SDK that lets you easily build agents that can call your existing code. As a highly extensible SDK, you can use Semantic Kernel with models from Compass. By combining your existing C#, Python, and Java code with these models, you can build agents that answer questions and automate processes.
For more information, please refer to the Semantic Kernel documentation.
Where can I see API call data for my Pay As You Go models?
This can be viewed in the operations table in the reporting section.
How are quotas managed for different types of users or applications?
When you subscribe to Compass, you'll receive a default quota for the available AI models. The quota reduces as you assign TPM to each deployment as it is created. You can continue to create deployments and assign them TPM until you reach your quota limit. After reaching the quota limit, you can only create new deployments of that model by reducing the TPM assigned to other deployments of the same model (thus freeing TPM for use), or by requesting and being approved for a model quota increase in the desired region.
Will there be a discount available for long scale usage customers?
Compass provides plans based on the customer's specific requirements. Please contact the Compass team.
What is the rate limit policy?
The Tokens-per-minute (TPM) rate limits are based on the maximum number of tokens estimated to be processed by a request at the time when the request is received. It is not the same as the token count used for billing, which is computed after all processing is completed.
As each API call request is received, Compass computes an estimated maximum processed-token count that includes the following:
-
Prompt text and count
-
The
max_tokens
parameter setting -
The
best_of
parameter setting
As requests come into the deployment endpoint, the estimated max-processed-token count is added to a running token count of all requests that is reset each minute. If, at any time during that minute, the TPM rate limit value is reached, then further requests will receive a response code 429 until the counter resets.