Create next-gen voice agents with Azure AI’s Voice Live API and Azure Communication Services

Voice-enabled AI Agents: transforming customer engagement with Azure AI Speech

May 21, 2025

[In preview] Public Preview: Smarter Troubleshooting in Azure Monitor with AI-powered Investigation

May 21, 2025

Published by azurefeeds on May 21, 2025

Tags

Today at Microsoft Build, we’re excited to announce the General Availability of bidirectional audio streaming for Azure Communication Services Call Automation SDK. Unveiling the power of speech-to-speech AI through Azure Communication Services!

As previously seen at Microsoft Ignite in November 2024 the Call Automation bidirectional streaming APIs already work with services like Azure OpenAI to build conversational voice agents through speech to speech integrations. Now with General Availability release of Call Automation bidirectional streaming API and Azure AI Speech Services Voice Live API (Preview), creating voice agents has never been easier. Imagine AI agents that deliver seamless, low-latency, and naturally fluent conversations, transforming the way businesses and customers interact.

Bidirectional Streaming APIs allow customers to stream audio from ongoing calls to their webserver in near real-time, where their voice enabled Large Language Models (LLMs) can ingest the audio to reason over and provide voice responses to stream back into the call. In this release we have added support for extra security by adding JSON Web Token (JWT) based authentication for the websocket connection allowing developers to make sure they’re creating secure solutions.

As industries like customer service, education, HR, gaming, and public services see a surge in demand for generative AI voice chatbots, businesses are seeking real-time, natural-sounding voice interactions with the latest and greatest GenAI models. Integrating Azure Communication Services with the new Voice Live API from Azure AI Speech Services provides a low-latency interface that facilitates streaming speech input and output with Azure AI Speech’s advanced audio and voice capabilities. It supports multiple languages, diverse voices, and customization, and can even integrate with avatars for enhanced engagement. On the server side, powerful language models interpret the caller’s queries and stream human-like responses back in real time, ensuring fluid and engaging conversations.

By integrating these two technologies customers can create new innovative solutions for:

Multilingual agents

Develop virtual customer service representatives capable of of having conversations with end customers in their preferred language, allowing customers creating solutions for multilingual regions to create one solution to serve multiple languages and regions.

Noise suppression and echo cancellation

For AI voice agents to be effective they need clear audio to understand what the user is requesting, in order to improve AI efficiency, you can use out of the box noise suppression and echo cancellation built into the Voice Live API to help provide your AI agent the best quality audio to be able to clearly understand the end users requests and assist them.

Support for branded voices

Build voice agents that stay on brand with custom voices that represent your brand in any interaction with the customer, use Azure AI Speech services to create custom voice models that represent your brand and provide familiarity for your customers.

How to Integrate Azure Communication Services with Azure AI Speech Service Voice Live API

Language support

With the integration to Voice Live API, you can now create solutions for over 150+ locales for speech input and output with 600+ realistic voices out of the box. I if these voices don’t suit your needs, customers can take this one step further and create custom speech models for their brand.

How to start bidirectional streaming

const mediaStreamingOptions: MediaStreamingOptions = {
transportUrl: websocketUrl,
transportType: “websocket”,
contentType: “audio”,
audioChannelType: “unmixed”,
startMediaStreaming: true,
enableBidirectional: true,
audioFormat: “Pcm24KMono”
}

How to connect to Voice Live API (Preview)

RTClient realtimeClient = new RTClient(
new URL(endpoint),
{ key: apiKey },
{ modelOrAgent: model, apiVersion: ‘2025-05-01-preview’ }
);

realtimeClient.configure({
instructions: SYSTEM_PROMPT,
voice: { name: ‘en-US-AvaNeural’, type: ‘azure-standard’, temperature: 0.7 },
turn_detection: { type: ‘server_vad’ },
input_audio_transcription: { model: ‘whisper-1’, language: ‘en-us’ },
modalities: [‘text’, ‘audio’],
input_audio_format: “pcm16”,
output_audio_format: “pcm16”,
input_audio_noise_reduction: {
type: “azure_deep_noise_suppression”
},
input_audio_echo_cancellation: {
type: “server_echo_cancellation”
},
});

Next Steps

The SDK and documentation along with a sample will be available in the next few weeks following this announcement, allowing you to build your own solutions using Azure Communication Services and Azure AI Voice Live API. To learn more about the Voice Live API and all its different capabilities, see Azure AI Blog.

Voice-enabled AI Agents: transforming customer engagement with Azure AI Speech

[In preview] Public Preview: Smarter Troubleshooting in Azure Monitor with AI-powered Investigation

Voice-enabled AI Agents: transforming customer engagement with Azure AI Speech

[In preview] Public Preview: Smarter Troubleshooting in Azure Monitor with AI-powered Investigation

Multilingual agents

Noise suppression and echo cancellation

Support for branded voices

How to Integrate Azure Communication Services with Azure AI Speech Service Voice Live API

Language support

How to start bidirectional streaming

How to connect to Voice Live API (Preview)

Next Steps

Related posts

ATTENTION: AI Courseware alignment issues

IAMCP Profiles in Partnership Ep 11 | Creating Tomorrow with Partners

Memory leak from improper usage of Microsoft.Extensions.Configuration APIs in .NET on Windows