Deploy LangChain applications to Azure App Service

From Healthy to Unhealthy: Alerting on Defender for Cloud Recommendations with Logic Apps

August 6, 2025

Generally Available: MongoDB Atlas as an Azure Native Integration

August 7, 2025

Published by azurefeeds on August 6, 2025

What We’re Building

Our sample application is a FastAPI web service that provides:

Real-time streaming responses from Azure OpenAI’s GPT-4o model

Automatic summarization of long responses using LangChain’s summarize chain

Secure authentication via Azure Managed Identity

Modern chat UI with a responsive design

Easy deployment using Azure Developer CLI (azd)

Key Technical Highlights

1. Secure Connection to Azure OpenAI with Managed Identity

This sample uses Azure Managed Identity for authentication. This eliminates the need to store API keys in your code or configuration files:

from azure.identity import DefaultAzureCredential

# Use Managed Identity to get a token for Azure OpenAI
credential = DefaultAzureCredential()
token = credential.get_token(“https://cognitiveservices.azure.com/.default”)

# Configure LangChain with the token
llm_long = AzureChatOpenAI(
azure_endpoint=endpoint,
openai_api_version=”2025-01-01-preview”,
deployment_name=deployment,
temperature=0.5,
streaming=True,
max_tokens=600,
azure_ad_token=token.token # Secure token-based auth
)

This approach provides several benefits:

Enhanced security: No API keys to manage or accidentally expose

Simplified operations: Azure handles token refresh automatically

Enterprise-ready: Integrates with Azure RBAC and compliance policies

2. Intelligent Response Chaining with LangChain

The application showcases LangChain’s powerful chaining capabilities by creating two distinct AI workflows:

# LLM for detailed responses
llm_long = AzureChatOpenAI(
# … configuration for detailed answers
streaming=True,
max_tokens=600
)

# LLM for concise summaries
llm_summary = AzureChatOpenAI(
# … configuration optimized for summaries
temperature=0, # More deterministic for summaries
max_tokens=200
)

# Create a summarization chain
summarize_chain = load_summarize_chain(llm_summary, chain_type=”stuff”)

This dual-model approach allows users to receive both comprehensive answers and digestible summaries, enhancing the user experience significantly.

3. Real-Time Streaming Responses

The application implements streaming responses to provide immediate feedback to users:

async def streamer():
# 1. Stream the long answer token by token
long_answer = “”
for chunk in llm_long.stream(messages):
long_answer += chunk.content
yield chunk.content # Stream to frontend immediately
await asyncio.sleep(0) # Yield control to event loop

# 2. Generate and stream summary after completion
docs = [Document(page_content=long_answer)]
summary = await loop.run_in_executor(None, summarize_chain.run, docs)
yield “__SUMMARY__” + summary

return StreamingResponse(streamer(), media_type=”text/plain”)

This streaming approach creates a responsive user experience where text appears as it’s generated, similar to ChatGPT’s interface.

4. Token Management and Response Tuning for AI Applications

AI applications require careful consideration of token usage to avoid throttling and optimize performance. The code includes some defaults for both token limits and response behavior:

# Restrict max_tokens to avoid hitting rate limits
llm_long = AzureChatOpenAI(
max_tokens=600, # Balanced for detailed responses
temperature=0.5 # Moderate creativity for conversational responses
)

llm_summary = AzureChatOpenAI(
max_tokens=200, # Shorter for summaries
temperature=0 # Lower temperature for more focused, deterministic summaries
)

Key considerations for AI applications:

Token limits: Prevent hitting Azure OpenAI rate limits and manage costs

Temperature settings: Lower values (0-0.3) produce more focused, consistent responses, while higher values (0.7-1.0) increase creativity

Response optimization: Different configurations for different use cases (detailed vs. summary responses)

These parameters can be adjusted based on your Azure OpenAI quota and specific use case requirements.

Deploying Your application

Getting this sample running in your Azure environment is straightforward with Azure Developer CLI:

Prerequisites

Azure Developer CLI (azd)

An Azure subscription with Azure OpenAI and App Service access

Python 3.10+

Deployment Steps

1. Clone and navigate to the project:

git clone https://github.com/Azure-Samples/appservice-ai-samples.git
cd langchain-fastapi-chat

2. Initialize azd

azd init

3. Deploy everything

azd up

That’s it! The azd up command will:

Provision Azure AI Foundry and deploy the GPT-4o model

Create an App Service with managed identity

Configure role assignments for secure access

Deploy your FastAPI application

Set up all necessary environment variables

See It In Action

Once deployed, you can ask it a question and it will show you both the detailed answer and a summary.

Customization Options

Switch Models

To use a different AI model, update the aiFoundryModelName parameter in infra/main.bicep:

@description(‘AI Foundry Model deployment name’)
param aiFoundryModelName string = ‘gpt-3.5-turbo’ // or your preferred model

Adjust Token Limits

Modify the max_tokens values in app.py based on your quota:

llm_long = AzureChatOpenAI(
max_tokens=1000, // Increase for longer responses
# …
)

Use API Keys Instead of Managed Identity

If you prefer API key authentication, you can modify the LangChain configuration:

llm_long = AzureChatOpenAI(
azure_endpoint=endpoint,
openai_api_key=your_api_key, // Instead of azure_ad_token
# …
)

Next Steps

This sample provides a foundation for building more sophisticated AI applications. Consider extending it with:

Conversation memory using LangChain’s memory components

Document upload and analysis capabilities

Multiple AI model support for different use cases

User authentication and personalization

Advanced prompt engineering for domain-specific responses

Conclusion

The complete sample code and deployment templates are available in the appservice-ai-samples repository

Ready to build your own AI chat app? Clone the repo and run azd up to get started in minutes!

For more Azure App Service AI samples and best practices, check out the Azure App Service AI integration documentation

From Healthy to Unhealthy: Alerting on Defender for Cloud Recommendations with Logic Apps

Generally Available: MongoDB Atlas as an Azure Native Integration

From Healthy to Unhealthy: Alerting on Defender for Cloud Recommendations with Logic Apps

Generally Available: MongoDB Atlas as an Azure Native Integration

What We’re Building

Key Technical Highlights

1. Secure Connection to Azure OpenAI with Managed Identity

2. Intelligent Response Chaining with LangChain

3. Real-Time Streaming Responses

Deploying Your application

Prerequisites

Deployment Steps

1. Clone and navigate to the project:

See It In Action

Customization Options

Switch Models

Next Steps

Conclusion

Related posts

5 Microsoft 365 Copilot features to unlock everyday productivity

Protecting You in Meetings: Sensitive Content Detection in Teams is Now GA 🎉

Announcing Public Preview: Phishing Triage Agent in Microsoft Defender