Cohere Models Now Available on Managed Compute in Azure AI Foundry Models

Announcing a new workshop experience for AI Gateway for Azure API Management.

June 13, 2025

Quest 3 – I want to add a simple chat interface to my AI prototype

June 13, 2025

Published by azurefeeds on June 13, 2025

Tags

Over the course of the last year, we have launched several Cohere models on Azure as Serverless Standard (pay-go) offering. We’re excited to announce that Cohere’s latest models—Command R+, Rerank 3.5, and Embed v4—are now available on Azure AI Foundry models via Managed Compute.

This launch allows enterprises and developers to now deploy Cohere models instantly with their own Azure quota, with per-hour GPU pricing that compensates the model provider—unlocking a scalable, low-friction path to production-ready GenAI.

What is Managed Compute?

Managed Compute is a deployment option within Azure AI Foundry Models that lets you run large language models (LLMs), SLMs, HuggingFace models and custom models fully hosted on Azure infrastructure.

Why Use Managed Compute?

Azure Managed Compute is a powerful deployment option for models not available via standard (pay-go) endpoints. It gives you:

Custom model support: Deploy open-source or third-party models

Infrastructure flexibility: Choose your own GPU SKUs (A10, A100, H100)

Detailed control: Configure inference servers, protocols, and advanced settings

Full integration: Works with Azure ML SDK, CLI, Prompt Flow, and REST APIs

Enterprise-ready: Supports VNet, private endpoints, quotas, and scaling policies

Cohere Models Now Available on Managed Compute

Command R+

Use case: Advanced generation, reasoning, agentic frameworks

Pricing: $17.125 / GPU / hour

Rerank 3.5

Use case: Retrieval-Augmented Generation (RAG), semantic search, ranking

Pricing: $3.50 / instance / hour

Embed v4

Use case: Text embeddings for vector search, clustering, classification

Pricing: $2.94 / instance / hour

Why This Matters

This is a big step forward for the model ecosystem. With managed compute, Azure makes it easy to:

Access and pay for top-tier models like Cohere by bringing your own compute

Support model builders by compensating them for usage

Deploy production GenAI apps without infrastructure overhead

Choose performance—A10, A100, and H100-backed SKUs for latency-sensitive use cases

Get Started

You can find these models in Azure AI Foundry Models. Just select your model, choose a deployment target, and launch with confidence—usage-based billing is already built in.

With Cohere’s models now on managed computer, building GneAI apps using foundation models has never been faster, easier, or more enterprise-ready.

Cohere provides the model weights

Azure hosts the model on managed VMs (A10/A100/H100 GPUs)

Customers deploy and pay-per-hour, with usage automatically compensating Cohere

Announcing a new workshop experience for AI Gateway for Azure API Management.

Quest 3 – I want to add a simple chat interface to my AI prototype

Announcing a new workshop experience for AI Gateway for Azure API Management.

Quest 3 – I want to add a simple chat interface to my AI prototype

Related posts

Azure Front Door rules engine scenarios and configurations

Microsoft 365 Copilot Notebooks available in OneNote on Windows

New in Azure Marketplace: May 17-31, 2025