Partner Blog | Streamline your campaign execution with Partner Marketing Center Pro
June 5, 2026Run Global Secure Access with confidence: Introducing the GSA Operations Guide
June 5, 2026Summary
You want to give your engineering org Claude Code without handing out Anthropic API keys, without per-developer billing sprawl, and without losing visibility into who is spending what. This post shows a battle-tested pattern:
- Claude models run in Microsoft Foundry, billed through your Azure subscription — no Anthropic contract or keys required.
- Azure API Management (APIM) sits in front as an LLM gateway: it authenticates each developer with Entra ID, enforces per-user rate limits and token quotas, and emits per-user usage metrics for chargeback.
- Foundry lives in its own Azure subscription, and APIM authenticates to it with a Foundry API key — so there’s no cross-subscription RBAC to untangle.
- Developers hold only short-lived Entra tokens. The Foundry key never leaves APIM.
Everything below is grounded in the Claude Code LLM gateway requirements and Azure API Management’s GenAI gateway policies. All command-line steps are shown in PowerShell for Windows developers.
The problem
Claude Code is a terminal- and IDE-native coding agent that talks to Claude over the Anthropic Messages API. Pointing it directly at Anthropic (or even directly at Foundry) creates three headaches for any organization beyond a handful of users:
- Key sprawl and billing. Direct API keys mean either a shared key (no per-user attribution, a rotation nightmare) or many keys (procurement and offboarding overhead).
- No throttle. Claude Code is token-heavy — it reads files, plans, and edits in long loops. One runaway session or an over-enthusiastic team can produce a surprising bill with nothing standing between the developer and the model.
- No visibility. Finance wants to know cost per team. Security wants to know who is calling what. A raw key gives you neither.
The fix is a gateway that every request flows through — one that knows who the developer is (Entra ID), enforces how much they can use (APIM GenAI policies), and records what they used (Azure Monitor). Claude Code supports exactly this through its gateway configuration.
Architecture
Claude Code on a developer laptop authenticates to Azure API Management with an Entra ID bearer token; APIM validates the token, applies per-user token and request limits, swaps in the Foundry API key, and forwards the Anthropic Messages request to Claude in Microsoft Foundry in a separate subscription; per-user token usage is emitted to Application Insights.
The request path:
Developer laptop (Claude Code CLI / VS Code)
| Authorization: Bearer
v
Azure API Management (the LLM gateway) [Subscription A]
| 1. validate-jwt confirm Entra identity, audience, app role
| 2. extract oid per-user counter key
| 3. llm-token-limit per-user tokens/min + monthly token quota
| 4. rate-limit-by-key per-user requests/min
| 5. strip Authorization; set api-key from secret named value
| 6. llm-emit-token-metric per-user usage to App Insights
v (forwards Anthropic Messages format; anthropic-* headers preserved)
Microsoft Foundry https://{resource}.services.ai.azure.com/anthropic/v1/messages
v [Subscription B]
Claude deployments (Sonnet 4.6 / Haiku 4.5 / Opus 4.6)
The key idea: developer-facing auth and backend auth are independent. Developers always authenticate as themselves with Entra ID at the gateway. How the gateway authenticates to Foundry is a separate decision — and you have two good options.
Choosing how the gateway authenticates to Foundry
Both options below are independent of the developer-facing Entra ID auth, and both work whether Foundry is in the same subscription as APIM or a different one. The only hard constraint for managed identity is that both resources live in the same Entra tenant.
| Option A — Foundry API key | Option B — Managed identity | |
|---|---|---|
| How APIM authenticates | api-key header from a secret named value | Entra token from APIM’s managed identity, in the Authorization header |
| Setup | Read the key once, store it in APIM | Enable APIM’s identity, assign Cognitive Services User on Foundry |
| Same subscription | Works | Works |
| Cross-subscription | Works — no RBAC crosses the boundary | Works — role assignment spans subscriptions in the same tenant |
| Cross-tenant | Works | Not supported — use a key |
| Shared secret to rotate | Yes | None |
| Best for | Fastest start; cross-tenant; key-only environments | Production; eliminates the shared secret |
This guide builds the key-based path end to end, then shows the managed-identity swap inline at each step (Parts 3 and 4). Pick one — you don’t need both.
What this design achieves
| Goal | How it’s met |
|---|---|
| Developers use Claude Code with no Anthropic billing or keys | Claude runs in Microsoft Foundry, billed through your Azure subscription |
| Foundry can live in a different subscription | APIM reaches Foundry by URL + API key only — no cross-subscription RBAC |
| Every developer authenticates as themselves | Entra ID tokens validated at the APIM gateway |
| Per-developer rate limits and quotas | rate-limit-by-key + llm-token-limit keyed on the Entra oid claim |
| Per-developer usage and cost tracking | llm-emit-token-metric → Application Insights / Log Analytics |
| No Foundry keys on developer laptops | The Foundry key lives only inside APIM; developers hold short-lived Entra tokens |
Prerequisites
- Two Azure subscriptions, both pay-as-you-go. Subscription A holds APIM; Subscription B holds Foundry. (Foundry Claude does not run on free, trial, sponsored, or CSP subscriptions.)
- A Microsoft Foundry resource (Subscription B) in a region where Claude is available — currently East US 2 or Sweden Central — with Claude deployments created and at least one API key under Keys and Endpoint.
- An API Management instance (Subscription A). Developer SKU is fine for a pilot; Standard v2 or Premium for production and VNet integration.
- Permission to read the Foundry key in Subscription B, contributor on the APIM instance, and the ability to register Entra apps.
- Developers on Windows 10/11 with PowerShell (5.1 built-in, or 7), the Azure CLI (winget install Microsoft.AzureCLI), and the Claude Code CLI installed.
Option A (key): no cross-subscription role assignment — the only cross-subscription action is reading the Foundry key once (Part 3), which you can also do from the Foundry portal. Option B (managed identity): one cross-subscription role assignment (Cognitive Services User), supported as long as APIM and Foundry share an Entra tenant.
Part 1 — Deploy Claude in Foundry (Subscription B)
- In the Foundry portal, open Model catalog, search Claude, and deploy the models Claude Code uses. Name each deployment to match its model ID so the gateway can pass the model field through unchanged:
Role Deployment name (recommended) Primary (general coding) claude-sonnet-4-6 Fast (file reads, small edits, background tasks) claude-haiku-4-5 Extended thinking (optional) claude-opus-4-6
- Pin versions — select a specific version, not auto-update to latest. Without pinning, a new model release can break every developer at once.
- On the resource’s Keys and Endpoint blade, copy the endpoint and one of the two API keys. The Anthropic endpoint base is:
https://{resource}.services.ai.azure.com/anthropic
Critical: Foundry’s Claude endpoint is the Anthropic surface (/anthropic/v1/messages), not the OpenAI surface (/openai/deployments/…/chat/completions?api-version=…). When you build the APIM API, do not apply the OpenAI policy template, do not add an api-version query parameter, and do not rewrite to an /openai/… path. Any of these produces the “not supported” or “resource not found” errors people commonly hit.
✅ Checkpoint: You now have Claude deployed in Foundry. Verify your deployment before continuing to Part 2.
Part 2 — Entra ID app registration (developer-facing)
This registration lives in Subscription A’s tenant. It defines the audience developers’ tokens are issued for, and what APIM validates. It’s unaffected by where Foundry lives.
- App registrations → New registration → name it e.g. Claude Code Gateway.
- Expose an API → set the Application ID URI, e.g. api://claude-code-gateway. Add a scope access_as_user (admin + user consent).
- (Optional, for tiering) App roles → add roles such as Claude.Standard and Claude.Premium. Assign developers or groups under Enterprise applications → this app → Users and groups.
- Note the Application (client) ID, the Application ID URI, and your Tenant ID.
Developers request tokens for this app’s audience; APIM validates aud = api://claude-code-gateway.
Part 3 — Provision the APIM API and Foundry backend (Subscription A)
3.1 Option A — Store the Foundry API key in APIM
First read the key from Foundry in Subscription B (use –subscription so you don’t have to switch your active context):
# Read a Foundry key from Subscription B
$FOUNDRY_KEY = az cognitiveservices account keys list `
–name `
–resource-group `
–subscription `
–query key1 -o tsv
Then store it as a secret named value in APIM (Subscription A). The policy references it as {{foundry-api-key}}:
# Create a secret named value in APIM holding the Foundry key
az apim nv create -g –service-name `
–named-value-id foundry-api-key `
–display-name foundry-api-key `
–value “$FOUNDRY_KEY” `
–secret true
Hardening: instead of the raw key in APIM, put it in Key Vault and create a Key Vault-backed named value, so rotation lives in one place. APIM needs a managed identity with Get/List secret access on that vault — but the vault is in Subscription A alongside APIM, so this is still not a cross-subscription role assignment.
3.2 Option B — Give APIM a managed identity instead
If you’d rather not manage a shared key, skip 3.1 and give APIM an identity that Foundry trusts. This works in the same subscription and across subscriptions alike, as long as both resources are in the same Entra tenant.
# Enable a system-assigned managed identity on APIM (Subscription A)
az apim update -g –name `
–set identity.type=SystemAssigned
# Get the identity’s principal (object) ID
$APIM_MI = az apim show -g –name `
–query identity.principalId -o tsv
# Get the Foundry resource ID (Subscription B)
$FOUNDRY_ID = az cognitiveservices account show `
–name –resource-group `
–subscription `
–query id -o tsv
# Grant Cognitive Services User on the Foundry resource (works cross-subscription)
az role assignment create `
–assignee-object-id $APIM_MI `
–assignee-principal-type ServicePrincipal `
–role “Cognitive Services User” `
–scope $FOUNDRY_ID
The Cognitive Services User role (a97b65f3-24c7-4388-baec-2e87135dc908) grants data-plane access to call the model without key-management rights. Role assignments can take a few minutes to propagate. A user-assigned identity works too — assign it to APIM and reference its client ID in the policy (Part 4, Option B). On this path there is no foundry-api-key named value to create or rotate.
3.3 Create the backend and API
# Named backend pointing at the Foundry Anthropic endpoint (Subscription B URL)
az apim backend create -g –service-name `
–backend-id foundry-claude `
–url “https://.services.ai.azure.com/anthropic” `
–protocol http
# API with NO path suffix so callers hit /v1/messages at the gateway root
az apim api create -g –service-name `
–api-id claude-anthropic –display-name “Claude (Foundry)” `
–path=”” –protocols https `
–service-url “https://.services.ai.azure.com/anthropic”
PowerShell + empty strings: write –path=”” (joined with =), not –path “” as two tokens. PowerShell strips a bare “” before the az wrapper sees it, so the CLI reports argument –path: expected one argument. The = form keeps it a single token (–path=) that az reads as an empty string. The same trick applies to any empty-string value you pass to az from PowerShell.
Add the operations Claude Code calls (a wildcard covers them all):
- POST /v1/messages
- POST /v1/messages/count_tokens
- GET /v1/models (only if you enable gateway model discovery — see Part 5.3)
az apim can’t apply XML policies. Apply the Part 4 policy via the portal (APIs → Claude (Foundry) → Inbound processing → policy editor) or via Bicep/ARM.
Part 4 — The APIM policy (auth + rate limiting + metering)
Apply this at the API level. Replace the tenant ID and audience. The policy below is the key-based (Option A) version — its step 6 removes the developer’s Authorization header and sets the api-key header from the secret named value. For managed identity (Option B), swap step 6 as shown immediately after the policy; every other step is identical.
@(“Bearer ” + context.Request.Headers.GetValueOrDefault(“x-api-key”,””))
{{gateway-audience}}
https://login.microsoftonline.com/{{tenant-id}}/v2.0
https://sts.windows.net/{{tenant-id}}/
Claude.Standard
Claude.Premium
<set-variable name="modelName" value="@{
var body = context.Request.Body.As(preserveContent: true);
return body?[“model”]?.ToString() ?? “unknown”;
}” />
<dimension name="Model" value="@(context.Request.Body?.As(true)?[“model”]?.ToString() ?? “unknown”)” />
{{foundry-api-key}}
Option B — authenticate to Foundry with managed identity
If you chose the managed-identity path (3.2), replace step 6 above with the block below. Instead of injecting an api-key, APIM acquires an Entra token for its own identity and forwards it as the Authorization bearer token. Token validation, rate limits, and metering are unchanged.
@(“Bearer ” + (string)context.Variables[“msi-token”])
The token audience for Azure AI Services / Foundry is https://cognitiveservices.azure.com. For a user-assigned identity, add client-id=”” to the authentication-managed-identity element. There’s no api-key named value and no secret to rotate on this path — which is exactly why it’s the preferred production posture.
Policy notes
- Stripping the developer’s Authorization header before forwarding (step 6) matters: that Entra token is for APIM only. Foundry must receive only the api-key header.
- {{tenant-id}}, {{gateway-audience}}, and {{foundry-api-key}} are APIM named values. Mark foundry-api-key as secret; the first two can be plain named values.
- llm-token-limit and llm-emit-token-metric are APIM’s GenAI gateway policies — they understand the Anthropic/OpenAI message formats and parse token usage, so you meter tokens, not just requests. That’s the right cost lever for token-heavy Claude Code.
- These counters are per-region per-gateway. With multi-region APIM, limits are enforced per region.
Part 5 — Configure Claude Code on developer machines
Developers point Claude Code at APIM (Anthropic Messages gateway mode) and authenticate with their own Entra token. The backend-auth swap is invisible to clients.
5.1 Entra token helper (per-developer, auto-refreshing)
Create %USERPROFILE%.claudeget-claude-gateway-token.ps1:
# Returns a short-lived Entra access token for the APIM gateway audience.
az account get-access-token `
–resource “api://claude-code-gateway” `
–query accessToken -o tsv
PowerShell scripts need no chmod. If execution policy blocks the helper, allow local scripts for your user once:
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned
5.2 Claude Code settings (%USERPROFILE%.claudesettings.json)
In enabling configuration of below environment variables in settings.json under .claude folder allows its usage for all Claude Code sessions (VS Code, terminal CLI, JetBrains, etc.)
{
“env”: {
“ANTHROPIC_BASE_URL”: “https://azure-api.net”,
“ANTHROPIC_MODEL”: “claude-opus-4-8”,
“ANTHROPIC_DEFAULT_OPUS_MODEL”: “claude-opus-4-8”,
“CLAUDE_CODE_API_KEY_HELPER_TTL_MS”: “600000”
},
“apiKeyHelper”: “powershell -NoProfile -ExecutionPolicy Bypass -File C:Users\.claudeget-claude-gateway-token.ps1”
}
In JSON, backslashes must be doubled — hence C:Users…. Use pwsh in place of powershell if you run PowerShell 7.
- apiKeyHelper output is sent as the Authorization (and X-Api-Key) header, validated by APIM’s validate-jwt. The developer never holds the Foundry key.
- CLAUDE_CODE_API_KEY_HELPER_TTL_MS=3600000 refreshes the token hourly (Entra access tokens last ~60–90 minutes).
- Pinning the three ANTHROPIC_DEFAULT_*_MODEL IDs ensures Claude Code sends model names that match your Foundry deployment names, so the gateway passes model through untouched.
- Other Anthropic models like Sonnet and Haiku can be provided. Default model to be used is provided with ANTHROPIC_MODEL.
Then developers run claude from their project folder.
5.3 Optional — model discovery
To list gateway models in the /model picker, expose GET /v1/models on the API and set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 (Claude Code v2.1.129+). Only IDs starting with claude or anthropic appear.
5.4 VS Code extension
Settings.json in .claude folder will control both VS Code Extension and Claude Code CLI.
🚀 Ready to test? Jump to Part 7 to validate your setup, or run claude from your project folder to try it live.
Part 6 — Rate-limiting and usage-tracking design
Per-developer keying. Everything is keyed on the Entra oid claim — stable and unique per user, unlike email or upn which can change. For service accounts or CI, key on appid instead.
Two enforcement layers:
- llm-token-limit — tokens/min plus a monthly token quota. The real cost control.
- rate-limit-by-key — requests/min. Guards against runaway loops.
Tiering is driven by Entra app roles (Claude.Standard / Claude.Premium) read from the JWT — no separate APIM subscription management needed.
Usage tracking flows from llm-emit-token-metric into Application Insights with UserId, Tier, and Model dimensions. Example Log Analytics query for per-user monthly token spend:
customMetrics
| where name == “Total Tokens” and customDimensions.namespace == “claudecode”
| extend UserId = tostring(customDimensions.UserId), Model = tostring(customDimensions.Model)
| summarize Tokens = sum(valueSum) by UserId, Model, bin(timestamp, 1d)
| order by Tokens desc
Foundry doesn’t return Anthropic’s standard rate-limit headers, so manage and observe limits through APIM (the headers above) and Azure Monitor rather than relying on upstream headers.
Part 7 — Test and validate
# 1. Get a token as a developer
$TOKEN = az account get-access-token –resource “api://claude-code-gateway” –query accessToken -o tsv
# 2. Call the gateway directly in Anthropic Messages format
$body = @{
model = “claude-sonnet-4-6”
max_tokens = 64
messages = @(@{ role = “user”; content = “Say hello in one word.” })
} | ConvertTo-Json
Invoke-RestMethod -Method Post `
-Uri “https://.azure-api.net/v1/messages” `
-Headers @{
“Authorization” = “Bearer $TOKEN”
“anthropic-version” = “2023-06-01”
“content-type” = “application/json”
} `
-Body $body
Invoke-RestMethod returns the parsed body but hides response headers. To see x-tokens-remaining / x-ratelimit-remaining, use Invoke-WebRequest with -ResponseHeadersVariable resp (then read $resp), or call curl.exe -i (the real curl, not PowerShell’s curl alias).
Validation checklist
- No token / expired token → 401 from validate-jwt (confirm before trusting rate limits).
- Valid token → 200 with a Claude completion; response carries x-tokens-remaining / x-ratelimit-remaining.
- 401 from Foundry on a valid developer token → the api-key named value is wrong or not injected (see Troubleshooting).
- Exceed the limit → 429 with retry-after.
- App Insights → customMetrics shows token counts dimensioned by UserId.
- Then run claude end to end from a project folder.
Part 8 — Operations and hardening
- Key rotation (Option A). Foundry gives you two keys. Rotate by updating the foundry-api-key named value to key2, then regenerating key1 — zero downtime. A Key Vault-backed named value makes this a one-place change.
- Prefer managed identity in production (Option B). If you started on the key path, switch to managed identity (Parts 3.2 and 4, Option B) to remove the shared secret entirely. Because the Cognitive Services User role assignment works across subscriptions in the same tenant, the cross-subscription topology doesn’t block this upgrade — and developers see no change, since their side of the contract is always authenticate to the gateway as yourself.
- Private networking. Put APIM in internal VNet mode and reach Foundry over a Private Endpoint; disable Foundry public network access so the gateway is the only path in. Cross-subscription private endpoints are supported.
- Resilience. Deploy Claude in two regions and use APIM’s load-balanced backend pool with retry on 429 and 5xx.
- Cost guardrails. Pair per-user llm-token-limit quotas with an Azure Budget and alert on the Foundry resource in Subscription B.
Troubleshooting
| Symptom | Cause / fix |
|---|---|
| 404 resource not found from Foundry | Backend URL or path wrong, or an OpenAI-style rewrite applied. Backend must end in /anthropic; callers hit /v1/messages. Remove any /openai/… rewrite and api-version query param. |
| 401 from Foundry (developer token is valid) — Option A | The api-key header is missing/wrong, or the foundry-api-key named value wasn’t saved as expected. Confirm the named value, and that the policy deletes the developer Authorization header and sets api-key. |
| 401 / 403 from Foundry — Option B (managed identity) | The role assignment is missing or hasn’t propagated yet, or the token audience is wrong. Confirm APIM’s identity has Cognitive Services User on the Foundry resource, wait a few minutes, and ensure the policy requests resource=”https://cognitiveservices.azure.com”. For a user-assigned identity, confirm the client-id is set. |
| Managed identity works same-sub but not cross-sub | The two resources are in different Entra tenants. Cross-tenant managed identity isn’t supported — use the API key (Option A) instead. |
| 401 at the gateway even with a token | aud or issuer mismatch. Confirm the token’s aud = api://claude-code-gateway and you used the v2.0 OIDC config and issuer. |
| 403 from Foundry | The key belongs to a different Foundry resource, or the resource disabled key auth. Re-copy a key from Keys and Endpoint, or re-enable local/key auth. |
| Reduced Claude Code functionality | Gateway stripped anthropic-beta / anthropic-version. Ensure both headers pass through. |
| Model not available | Claude Code’s model ID doesn’t match the Foundry deployment name. Align names, or rewrite the body model field in policy. |
| ChainedTokenCredential authentication failed (client side) | Developer not logged in. Run az login so the helper has a usable Azure credential. |
Wrapping up
With about an afternoon of setup you get a gateway that every Claude Code request flows through: Entra ID proves who the developer is, APIM GenAI policies cap how much each person can spend, and Application Insights tells you exactly where the tokens went. For the APIM → Foundry hop you pick what fits: a Foundry API key held only inside APIM (fastest start, works cross-tenant) or a managed identity with no shared secret at all (the production posture). Either way Claude can live in its own subscription, and developers hold nothing more sensitive than a short-lived Entra token.
When you’re ready to tighten the screws, the upgrade path is clean: if you started on the key, move it into Key Vault, then graduate to managed identity to eliminate the secret entirely, and put the whole path on a private network.
‘None of those steps disrupt developers, because their side of the contract — authenticate to the gateway as yourself — never changes.
Start your pilot today: Deploy a Developer-tier APIM instance, connect it to Foundry, and have your first developer running Claude Code through the gateway by end of day. The Prerequisites section has everything you need to begin.’
All command-line steps target Windows with PowerShell 5.1 or 7. Model IDs and Foundry regions reflect availability at time of writing; check the Foundry model catalog for current options.
Next Steps
Get started now: – Deploy Claude models in Microsoft Foundry — browse the model catalog and create your first deployment – Create an API Management instance — spin up a Developer SKU for your pilot
Go deeper: – Claude Code LLM gateway requirements — full specification for gateway compatibility – APIM GenAI gateway policies reference — all available token and rate limiting options
Get help: – Questions? Post in the Azure AI Community with tag #ClaudeCode – Found an issue with this guide? Open a GitHub issue