Partner Blog | Streamline your campaign execution with Partner Marketing Center Pro

June 5, 2026

Run Global Secure Access with confidence: Introducing the GSA Operations Guide

June 5, 2026

Published by azurefeeds on June 5, 2026

Summary

You want to give your engineering org Claude Code without handing out Anthropic API keys, without per-developer billing sprawl, and without losing visibility into who is spending what. This post shows a battle-tested pattern:

Claude models run in Microsoft Foundry, billed through your Azure subscription — no Anthropic contract or keys required.

Azure API Management (APIM) sits in front as an LLM gateway: it authenticates each developer with Entra ID, enforces per-user rate limits and token quotas, and emits per-user usage metrics for chargeback.

Foundry lives in its own Azure subscription, and APIM authenticates to it with a Foundry API key — so there’s no cross-subscription RBAC to untangle.

Developers hold only short-lived Entra tokens. The Foundry key never leaves APIM.

Everything below is grounded in the Claude Code LLM gateway requirements and Azure API Management’s GenAI gateway policies. All command-line steps are shown in PowerShell for Windows developers.

The problem

Claude Code is a terminal- and IDE-native coding agent that talks to Claude over the Anthropic Messages API. Pointing it directly at Anthropic (or even directly at Foundry) creates three headaches for any organization beyond a handful of users:

Key sprawl and billing. Direct API keys mean either a shared key (no per-user attribution, a rotation nightmare) or many keys (procurement and offboarding overhead).

No throttle. Claude Code is token-heavy — it reads files, plans, and edits in long loops. One runaway session or an over-enthusiastic team can produce a surprising bill with nothing standing between the developer and the model.

No visibility. Finance wants to know cost per team. Security wants to know who is calling what. A raw key gives you neither.

The fix is a gateway that every request flows through — one that knows who the developer is (Entra ID), enforces how much they can use (APIM GenAI policies), and records what they used (Azure Monitor). Claude Code supports exactly this through its gateway configuration.

Architecture

Claude Code on a developer laptop authenticates to Azure API Management with an Entra ID bearer token; APIM validates the token, applies per-user token and request limits, swaps in the Foundry API key, and forwards the Anthropic Messages request to Claude in Microsoft Foundry in a separate subscription; per-user token usage is emitted to Application Insights.

The request path:

Developer laptop (Claude Code CLI / VS Code)
| Authorization: Bearer
v
Azure API Management (the LLM gateway) [Subscription A]
| 1. validate-jwt confirm Entra identity, audience, app role
| 2. extract oid per-user counter key
| 3. llm-token-limit per-user tokens/min + monthly token quota
| 4. rate-limit-by-key per-user requests/min
| 5. strip Authorization; set api-key from secret named value
| 6. llm-emit-token-metric per-user usage to App Insights
v (forwards Anthropic Messages format; anthropic-* headers preserved)
Microsoft Foundry https://{resource}.services.ai.azure.com/anthropic/v1/messages
v [Subscription B]
Claude deployments (Sonnet 4.6 / Haiku 4.5 / Opus 4.6)

The key idea: developer-facing auth and backend auth are independent. Developers always authenticate as themselves with Entra ID at the gateway. How the gateway authenticates to Foundry is a separate decision — and you have two good options.

Choosing how the gateway authenticates to Foundry

Both options below are independent of the developer-facing Entra ID auth, and both work whether Foundry is in the same subscription as APIM or a different one. The only hard constraint for managed identity is that both resources live in the same Entra tenant.

	Option A — Foundry API key	Option B — Managed identity
How APIM authenticates	api-key header from a secret named value	Entra token from APIM’s managed identity, in the Authorization header
Setup	Read the key once, store it in APIM	Enable APIM’s identity, assign Cognitive Services User on Foundry
Same subscription	Works	Works
Cross-subscription	Works — no RBAC crosses the boundary	Works — role assignment spans subscriptions in the same tenant
Cross-tenant	Works	Not supported — use a key
Shared secret to rotate	Yes	None
Best for	Fastest start; cross-tenant; key-only environments	Production; eliminates the shared secret

This guide builds the key-based path end to end, then shows the managed-identity swap inline at each step (Parts 3 and 4). Pick one — you don’t need both.

What this design achieves

Goal	How it’s met
Developers use Claude Code with no Anthropic billing or keys	Claude runs in Microsoft Foundry, billed through your Azure subscription
Foundry can live in a different subscription	APIM reaches Foundry by URL + API key only — no cross-subscription RBAC
Every developer authenticates as themselves	Entra ID tokens validated at the APIM gateway
Per-developer rate limits and quotas	rate-limit-by-key + llm-token-limit keyed on the Entra oid claim
Per-developer usage and cost tracking	llm-emit-token-metric → Application Insights / Log Analytics
No Foundry keys on developer laptops	The Foundry key lives only inside APIM; developers hold short-lived Entra tokens

Prerequisites

Two Azure subscriptions, both pay-as-you-go. Subscription A holds APIM; Subscription B holds Foundry. (Foundry Claude does not run on free, trial, sponsored, or CSP subscriptions.)

A Microsoft Foundry resource (Subscription B) in a region where Claude is available — currently East US 2 or Sweden Central — with Claude deployments created and at least one API key under Keys and Endpoint.

An API Management instance (Subscription A). Developer SKU is fine for a pilot; Standard v2 or Premium for production and VNet integration.

Permission to read the Foundry key in Subscription B, contributor on the APIM instance, and the ability to register Entra apps.

Developers on Windows 10/11 with PowerShell (5.1 built-in, or 7), the Azure CLI (winget install Microsoft.AzureCLI), and the Claude Code CLI installed.

Option A (key): no cross-subscription role assignment — the only cross-subscription action is reading the Foundry key once (Part 3), which you can also do from the Foundry portal. Option B (managed identity): one cross-subscription role assignment (Cognitive Services User), supported as long as APIM and Foundry share an Entra tenant.

Part 1 — Deploy Claude in Foundry (Subscription B)

In the Foundry portal, open Model catalog, search Claude, and deploy the models Claude Code uses. Name each deployment to match its model ID so the gateway can pass the model field through unchanged:

Role	Deployment name (recommended)
Primary (general coding)	claude-sonnet-4-6
Fast (file reads, small edits, background tasks)	claude-haiku-4-5
Extended thinking (optional)	claude-opus-4-6

Pin versions — select a specific version, not auto-update to latest. Without pinning, a new model release can break every developer at once.

On the resource’s Keys and Endpoint blade, copy the endpoint and one of the two API keys. The Anthropic endpoint base is:

https://{resource}.services.ai.azure.com/anthropic

Critical: Foundry’s Claude endpoint is the Anthropic surface (/anthropic/v1/messages), not the OpenAI surface (/openai/deployments/…/chat/completions?api-version=…). When you build the APIM API, do not apply the OpenAI policy template, do not add an api-version query parameter, and do not rewrite to an /openai/… path. Any of these produces the “not supported” or “resource not found” errors people commonly hit.

✅ Checkpoint: You now have Claude deployed in Foundry. Verify your deployment before continuing to Part 2.

Part 2 — Entra ID app registration (developer-facing)

This registration lives in Subscription A’s tenant. It defines the audience developers’ tokens are issued for, and what APIM validates. It’s unaffected by where Foundry lives.

App registrations → New registration → name it e.g. Claude Code Gateway.

Expose an API → set the Application ID URI, e.g. api://claude-code-gateway. Add a scope access_as_user (admin + user consent).

(Optional, for tiering) App roles → add roles such as Claude.Standard and Claude.Premium. Assign developers or groups under Enterprise applications → this app → Users and groups.

Note the Application (client) ID, the Application ID URI, and your Tenant ID.

Developers request tokens for this app’s audience; APIM validates aud = api://claude-code-gateway.

Part 3 — Provision the APIM API and Foundry backend (Subscription A)

3.1 Option A — Store the Foundry API key in APIM

First read the key from Foundry in Subscription B (use –subscription so you don’t have to switch your active context):

# Read a Foundry key from Subscription B
$FOUNDRY_KEY = az cognitiveservices account keys list `
–name `
–resource-group `
–subscription `
–query key1 -o tsv

Then store it as a secret named value in APIM (Subscription A). The policy references it as {{foundry-api-key}}:

# Create a secret named value in APIM holding the Foundry key
az apim nv create -g –service-name `
–named-value-id foundry-api-key `
–display-name foundry-api-key `
–value “$FOUNDRY_KEY” `
–secret true

Hardening: instead of the raw key in APIM, put it in Key Vault and create a Key Vault-backed named value, so rotation lives in one place. APIM needs a managed identity with Get/List secret access on that vault — but the vault is in Subscription A alongside APIM, so this is still not a cross-subscription role assignment.

3.2 Option B — Give APIM a managed identity instead

If you’d rather not manage a shared key, skip 3.1 and give APIM an identity that Foundry trusts. This works in the same subscription and across subscriptions alike, as long as both resources are in the same Entra tenant.

# Enable a system-assigned managed identity on APIM (Subscription A)
az apim update -g –name `
–set identity.type=SystemAssigned

# Get the identity’s principal (object) ID
$APIM_MI = az apim show -g –name `
–query identity.principalId -o tsv

# Get the Foundry resource ID (Subscription B)
$FOUNDRY_ID = az cognitiveservices account show `
–name –resource-group `
–subscription `
–query id -o tsv

# Grant Cognitive Services User on the Foundry resource (works cross-subscription)
az role assignment create `
–assignee-object-id $APIM_MI `
–assignee-principal-type ServicePrincipal `
–role “Cognitive Services User” `
–scope $FOUNDRY_ID

The Cognitive Services User role (a97b65f3-24c7-4388-baec-2e87135dc908) grants data-plane access to call the model without key-management rights. Role assignments can take a few minutes to propagate. A user-assigned identity works too — assign it to APIM and reference its client ID in the policy (Part 4, Option B). On this path there is no foundry-api-key named value to create or rotate.

3.3 Create the backend and API

# Named backend pointing at the Foundry Anthropic endpoint (Subscription B URL)
az apim backend create -g –service-name `
–backend-id foundry-claude `
–url “https://.services.ai.azure.com/anthropic” `
–protocol http

# API with NO path suffix so callers hit /v1/messages at the gateway root
az apim api create -g –service-name `
–api-id claude-anthropic –display-name “Claude (Foundry)” `
–path=”” –protocols https `
–service-url “https://.services.ai.azure.com/anthropic”

PowerShell + empty strings: write –path=”” (joined with =), not –path “” as two tokens. PowerShell strips a bare “” before the az wrapper sees it, so the CLI reports argument –path: expected one argument. The = form keeps it a single token (–path=) that az reads as an empty string. The same trick applies to any empty-string value you pass to az from PowerShell.

Add the operations Claude Code calls (a wildcard covers them all):

POST /v1/messages

POST /v1/messages/count_tokens

GET /v1/models (only if you enable gateway model discovery — see Part 5.3)

az apim can’t apply XML policies. Apply the Part 4 policy via the portal (APIs → Claude (Foundry) → Inbound processing → policy editor) or via Bicep/ARM.

Part 4 — The APIM policy (auth + rate limiting + metering)

Apply this at the API level. Replace the tenant ID and audience. The policy below is the key-based (Option A) version — its step 6 removes the developer’s Authorization header and sets the api-key header from the secret named value. For managed identity (Option B), swap step 6 as shown immediately after the policy; every other step is identical.

@(“Bearer ” + context.Request.Headers.GetValueOrDefault(“x-api-key”,””))

https://login.microsoftonline.com/{{tenant-id}}/v2.0

https://sts.windows.net/{{tenant-id}}/

Claude.Standard
Claude.Premium

<set-variable name="modelName" value="@{
var body = context.Request.Body.As(preserveContent: true);
return body?[“model”]?.ToString() ?? “unknown”;
}” />

Option B — authenticate to Foundry with managed identity

If you chose the managed-identity path (3.2), replace step 6 above with the block below. Instead of injecting an api-key, APIM acquires an Entra token for its own identity and forwards it as the Authorization bearer token. Token validation, rate limits, and metering are unchanged.

@(“Bearer ” + (string)context.Variables[“msi-token”])

The token audience for Azure AI Services / Foundry is https://cognitiveservices.azure.com. For a user-assigned identity, add client-id=”” to the authentication-managed-identity element. There’s no api-key named value and no secret to rotate on this path — which is exactly why it’s the preferred production posture.

Policy notes

Stripping the developer’s Authorization header before forwarding (step 6) matters: that Entra token is for APIM only. Foundry must receive only the api-key header.

{{tenant-id}}, {{gateway-audience}}, and {{foundry-api-key}} are APIM named values. Mark foundry-api-key as secret; the first two can be plain named values.

llm-token-limit and llm-emit-token-metric are APIM’s GenAI gateway policies — they understand the Anthropic/OpenAI message formats and parse token usage, so you meter tokens, not just requests. That’s the right cost lever for token-heavy Claude Code.

These counters are per-region per-gateway. With multi-region APIM, limits are enforced per region.

Part 5 — Configure Claude Code on developer machines

Developers point Claude Code at APIM (Anthropic Messages gateway mode) and authenticate with their own Entra token. The backend-auth swap is invisible to clients.

5.1 Entra token helper (per-developer, auto-refreshing)

Create %USERPROFILE%.claudeget-claude-gateway-token.ps1:

# Returns a short-lived Entra access token for the APIM gateway audience.
az account get-access-token `
–resource “api://claude-code-gateway” `
–query accessToken -o tsv

PowerShell scripts need no chmod. If execution policy blocks the helper, allow local scripts for your user once:

Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned

5.2 Claude Code settings (%USERPROFILE%.claudesettings.json)

In enabling configuration of below environment variables in settings.json under .claude folder allows its usage for all Claude Code sessions (VS Code, terminal CLI, JetBrains, etc.)

{
“env”: {
“ANTHROPIC_BASE_URL”: “https://azure-api.net”,
“ANTHROPIC_MODEL”: “claude-opus-4-8”,
“ANTHROPIC_DEFAULT_OPUS_MODEL”: “claude-opus-4-8”,
“CLAUDE_CODE_API_KEY_HELPER_TTL_MS”: “600000”
},
“apiKeyHelper”: “powershell -NoProfile -ExecutionPolicy Bypass -File C:Users\.claudeget-claude-gateway-token.ps1”
}

In JSON, backslashes must be doubled — hence C:Users…. Use pwsh in place of powershell if you run PowerShell 7.

apiKeyHelper output is sent as the Authorization (and X-Api-Key) header, validated by APIM’s validate-jwt. The developer never holds the Foundry key.

CLAUDE_CODE_API_KEY_HELPER_TTL_MS=3600000 refreshes the token hourly (Entra access tokens last ~60–90 minutes).

Pinning the three ANTHROPIC_DEFAULT_*_MODEL IDs ensures Claude Code sends model names that match your Foundry deployment names, so the gateway passes model through untouched.

Other Anthropic models like Sonnet and Haiku can be provided. Default model to be used is provided with ANTHROPIC_MODEL.

Then developers run claude from their project folder.

5.3 Optional — model discovery

To list gateway models in the /model picker, expose GET /v1/models on the API and set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 (Claude Code v2.1.129+). Only IDs starting with claude or anthropic appear.

5.4 VS Code extension

Settings.json in .claude folder will control both VS Code Extension and Claude Code CLI.

🚀 Ready to test? Jump to Part 7 to validate your setup, or run claude from your project folder to try it live.

Part 6 — Rate-limiting and usage-tracking design

Per-developer keying. Everything is keyed on the Entra oid claim — stable and unique per user, unlike email or upn which can change. For service accounts or CI, key on appid instead.

Two enforcement layers:

llm-token-limit — tokens/min plus a monthly token quota. The real cost control.

rate-limit-by-key — requests/min. Guards against runaway loops.

Tiering is driven by Entra app roles (Claude.Standard / Claude.Premium) read from the JWT — no separate APIM subscription management needed.

Usage tracking flows from llm-emit-token-metric into Application Insights with UserId, Tier, and Model dimensions. Example Log Analytics query for per-user monthly token spend:

customMetrics
| where name == “Total Tokens” and customDimensions.namespace == “claudecode”
| extend UserId = tostring(customDimensions.UserId), Model = tostring(customDimensions.Model)
| summarize Tokens = sum(valueSum) by UserId, Model, bin(timestamp, 1d)
| order by Tokens desc

Foundry doesn’t return Anthropic’s standard rate-limit headers, so manage and observe limits through APIM (the headers above) and Azure Monitor rather than relying on upstream headers.

Part 7 — Test and validate

# 1. Get a token as a developer
$TOKEN = az account get-access-token –resource “api://claude-code-gateway” –query accessToken -o tsv

# 2. Call the gateway directly in Anthropic Messages format
$body = @{
model = “claude-sonnet-4-6”
max_tokens = 64
messages = @(@{ role = “user”; content = “Say hello in one word.” })
} | ConvertTo-Json

Invoke-RestMethod -Method Post `
-Uri “https://.azure-api.net/v1/messages” `
-Headers @{
“Authorization” = “Bearer $TOKEN”
“anthropic-version” = “2023-06-01”
“content-type” = “application/json”
} `
-Body $body

Invoke-RestMethod returns the parsed body but hides response headers. To see x-tokens-remaining / x-ratelimit-remaining, use Invoke-WebRequest with -ResponseHeadersVariable resp (then read $resp), or call curl.exe -i (the real curl, not PowerShell’s curl alias).

Validation checklist

No token / expired token → 401 from validate-jwt (confirm before trusting rate limits).

Valid token → 200 with a Claude completion; response carries x-tokens-remaining / x-ratelimit-remaining.

401 from Foundry on a valid developer token → the api-key named value is wrong or not injected (see Troubleshooting).

Exceed the limit → 429 with retry-after.

App Insights → customMetrics shows token counts dimensioned by UserId.

Then run claude end to end from a project folder.

Part 8 — Operations and hardening

Key rotation (Option A). Foundry gives you two keys. Rotate by updating the foundry-api-key named value to key2, then regenerating key1 — zero downtime. A Key Vault-backed named value makes this a one-place change.

Prefer managed identity in production (Option B). If you started on the key path, switch to managed identity (Parts 3.2 and 4, Option B) to remove the shared secret entirely. Because the Cognitive Services User role assignment works across subscriptions in the same tenant, the cross-subscription topology doesn’t block this upgrade — and developers see no change, since their side of the contract is always authenticate to the gateway as yourself.

Private networking. Put APIM in internal VNet mode and reach Foundry over a Private Endpoint; disable Foundry public network access so the gateway is the only path in. Cross-subscription private endpoints are supported.

Resilience. Deploy Claude in two regions and use APIM’s load-balanced backend pool with retry on 429 and 5xx.

Cost guardrails. Pair per-user llm-token-limit quotas with an Azure Budget and alert on the Foundry resource in Subscription B.

Troubleshooting

Symptom	Cause / fix
404 resource not found from Foundry	Backend URL or path wrong, or an OpenAI-style rewrite applied. Backend must end in /anthropic; callers hit /v1/messages. Remove any /openai/… rewrite and api-version query param.
401 from Foundry (developer token is valid) — Option A	The api-key header is missing/wrong, or the foundry-api-key named value wasn’t saved as expected. Confirm the named value, and that the policy deletes the developer Authorization header and sets api-key.
401 / 403 from Foundry — Option B (managed identity)	The role assignment is missing or hasn’t propagated yet, or the token audience is wrong. Confirm APIM’s identity has Cognitive Services User on the Foundry resource, wait a few minutes, and ensure the policy requests resource=”https://cognitiveservices.azure.com”. For a user-assigned identity, confirm the client-id is set.
Managed identity works same-sub but not cross-sub	The two resources are in different Entra tenants. Cross-tenant managed identity isn’t supported — use the API key (Option A) instead.
401 at the gateway even with a token	aud or issuer mismatch. Confirm the token’s aud = api://claude-code-gateway and you used the v2.0 OIDC config and issuer.
403 from Foundry	The key belongs to a different Foundry resource, or the resource disabled key auth. Re-copy a key from Keys and Endpoint, or re-enable local/key auth.
Reduced Claude Code functionality	Gateway stripped anthropic-beta / anthropic-version. Ensure both headers pass through.
Model not available	Claude Code’s model ID doesn’t match the Foundry deployment name. Align names, or rewrite the body model field in policy.
ChainedTokenCredential authentication failed (client side)	Developer not logged in. Run az login so the helper has a usable Azure credential.

Wrapping up

With about an afternoon of setup you get a gateway that every Claude Code request flows through: Entra ID proves who the developer is, APIM GenAI policies cap how much each person can spend, and Application Insights tells you exactly where the tokens went. For the APIM → Foundry hop you pick what fits: a Foundry API key held only inside APIM (fastest start, works cross-tenant) or a managed identity with no shared secret at all (the production posture). Either way Claude can live in its own subscription, and developers hold nothing more sensitive than a short-lived Entra token.

When you’re ready to tighten the screws, the upgrade path is clean: if you started on the key, move it into Key Vault, then graduate to managed identity to eliminate the secret entirely, and put the whole path on a private network.

‘None of those steps disrupt developers, because their side of the contract — authenticate to the gateway as yourself — never changes.

Start your pilot today: Deploy a Developer-tier APIM instance, connect it to Foundry, and have your first developer running Claude Code through the gateway by end of day. The Prerequisites section has everything you need to begin.’

All command-line steps target Windows with PowerShell 5.1 or 7. Model IDs and Foundry regions reflect availability at time of writing; check the Foundry model catalog for current options.

Next Steps

Get started now: – Deploy Claude models in Microsoft Foundry — browse the model catalog and create your first deployment – Create an API Management instance — spin up a Developer SKU for your pilot

Go deeper: – Claude Code LLM gateway requirements — full specification for gateway compatibility – APIM GenAI gateway policies reference — all available token and rate limiting options

Get help: – Questions? Post in the Azure AI Community with tag #ClaudeCode – Found an issue with this guide? Open a GitHub issue

Partner Blog | Streamline your campaign execution with Partner Marketing Center Pro

Run Global Secure Access with confidence: Introducing the GSA Operations Guide

Partner Blog | Streamline your campaign execution with Partner Marketing Center Pro

Run Global Secure Access with confidence: Introducing the GSA Operations Guide

Summary

The problem

Architecture

Choosing how the gateway authenticates to Foundry

What this design achieves

Prerequisites

Part 1 — Deploy Claude in Foundry (Subscription B)

Part 2 — Entra ID app registration (developer-facing)

Part 3 — Provision the APIM API and Foundry backend (Subscription A)

3.1 Option A — Store the Foundry API key in APIM

3.2 Option B — Give APIM a managed identity instead

3.3 Create the backend and API

Part 4 — The APIM policy (auth + rate limiting + metering)

Option B — authenticate to Foundry with managed identity

Part 5 — Configure Claude Code on developer machines

5.1 Entra token helper (per-developer, auto-refreshing)

5.2 Claude Code settings (%USERPROFILE%.claudesettings.json)

5.3 Optional — model discovery

5.4 VS Code extension

Part 6 — Rate-limiting and usage-tracking design

Part 7 — Test and validate

Part 8 — Operations and hardening

Troubleshooting

Wrapping up

Next Steps

Related posts

Microsoft Defender for Cloud Customer Newsletter

Run Global Secure Access with confidence: Introducing the GSA Operations Guide

Partner Blog | Streamline your campaign execution with Partner Marketing Center Pro