Announcing a new workshop experience for AI Gateway for Azure API Management.
June 13, 2025Quest 3 – I want to add a simple chat interface to my AI prototype
June 13, 2025Over the course of the last year, we have launched several Cohere models on Azure as Serverless Standard (pay-go) offering. We’re excited to announce that Cohere’s latest models—Command R+, Rerank 3.5, and Embed v4—are now available on Azure AI Foundry models via Managed Compute.
This launch allows enterprises and developers to now deploy Cohere models instantly with their own Azure quota, with per-hour GPU pricing that compensates the model provider—unlocking a scalable, low-friction path to production-ready GenAI.
What is Managed Compute?
Managed Compute is a deployment option within Azure AI Foundry Models that lets you run large language models (LLMs), SLMs, HuggingFace models and custom models fully hosted on Azure infrastructure.
Why Use Managed Compute?
Azure Managed Compute is a powerful deployment option for models not available via standard (pay-go) endpoints. It gives you:
- Custom model support: Deploy open-source or third-party models
- Infrastructure flexibility: Choose your own GPU SKUs (A10, A100, H100)
- Detailed control: Configure inference servers, protocols, and advanced settings
- Full integration: Works with Azure ML SDK, CLI, Prompt Flow, and REST APIs
- Enterprise-ready: Supports VNet, private endpoints, quotas, and scaling policies
Cohere Models Now Available on Managed Compute
Command R+
- Use case: Advanced generation, reasoning, agentic frameworks
- Pricing: $17.125 / GPU / hour
Rerank 3.5
- Use case: Retrieval-Augmented Generation (RAG), semantic search, ranking
- Pricing: $3.50 / instance / hour
Embed v4
- Use case: Text embeddings for vector search, clustering, classification
- Pricing: $2.94 / instance / hour
Why This Matters
This is a big step forward for the model ecosystem. With managed compute, Azure makes it easy to:
- Access and pay for top-tier models like Cohere by bringing your own compute
- Support model builders by compensating them for usage
- Deploy production GenAI apps without infrastructure overhead
- Choose performance—A10, A100, and H100-backed SKUs for latency-sensitive use cases
Get Started
You can find these models in Azure AI Foundry Models. Just select your model, choose a deployment target, and launch with confidence—usage-based billing is already built in.
With Cohere’s models now on managed computer, building GneAI apps using foundation models has never been faster, easier, or more enterprise-ready.
- Cohere provides the model weights
- Azure hosts the model on managed VMs (A10/A100/H100 GPUs)
- Customers deploy and pay-per-hour, with usage automatically compensating Cohere