Voice-enabled AI Agents: transforming customer engagement with Azure AI Speech
May 21, 2025[In preview] Public Preview: Smarter Troubleshooting in Azure Monitor with AI-powered Investigation
May 21, 2025Today at Microsoft Build, we’re excited to announce the General Availability of bidirectional audio streaming for Azure Communication Services Call Automation SDK. Unveiling the power of speech-to-speech AI through Azure Communication Services!
As previously seen at Microsoft Ignite in November 2024 the Call Automation bidirectional streaming APIs already work with services like Azure OpenAI to build conversational voice agents through speech to speech integrations. Now with General Availability release of Call Automation bidirectional streaming API and Azure AI Speech Services Voice Live API (Preview), creating voice agents has never been easier. Imagine AI agents that deliver seamless, low-latency, and naturally fluent conversations, transforming the way businesses and customers interact.
Bidirectional Streaming APIs allow customers to stream audio from ongoing calls to their webserver in near real-time, where their voice enabled Large Language Models (LLMs) can ingest the audio to reason over and provide voice responses to stream back into the call. In this release we have added support for extra security by adding JSON Web Token (JWT) based authentication for the websocket connection allowing developers to make sure they’re creating secure solutions.
As industries like customer service, education, HR, gaming, and public services see a surge in demand for generative AI voice chatbots, businesses are seeking real-time, natural-sounding voice interactions with the latest and greatest GenAI models. Integrating Azure Communication Services with the new Voice Live API from Azure AI Speech Services provides a low-latency interface that facilitates streaming speech input and output with Azure AI Speech’s advanced audio and voice capabilities. It supports multiple languages, diverse voices, and customization, and can even integrate with avatars for enhanced engagement. On the server side, powerful language models interpret the caller’s queries and stream human-like responses back in real time, ensuring fluid and engaging conversations.
By integrating these two technologies customers can create new innovative solutions for:
Multilingual agents
Develop virtual customer service representatives capable of of having conversations with end customers in their preferred language, allowing customers creating solutions for multilingual regions to create one solution to serve multiple languages and regions.
Noise suppression and echo cancellation
For AI voice agents to be effective they need clear audio to understand what the user is requesting, in order to improve AI efficiency, you can use out of the box noise suppression and echo cancellation built into the Voice Live API to help provide your AI agent the best quality audio to be able to clearly understand the end users requests and assist them.
Support for branded voices
Build voice agents that stay on brand with custom voices that represent your brand in any interaction with the customer, use Azure AI Speech services to create custom voice models that represent your brand and provide familiarity for your customers.
How to Integrate Azure Communication Services with Azure AI Speech Service Voice Live API
Language support
With the integration to Voice Live API, you can now create solutions for over 150+ locales for speech input and output with 600+ realistic voices out of the box. I if these voices don’t suit your needs, customers can take this one step further and create custom speech models for their brand.
How to start bidirectional streaming
const mediaStreamingOptions: MediaStreamingOptions = {
transportUrl: websocketUrl,
transportType: “websocket”,
contentType: “audio”,
audioChannelType: “unmixed”,
startMediaStreaming: true,
enableBidirectional: true,
audioFormat: “Pcm24KMono”
}
How to connect to Voice Live API (Preview)
RTClient realtimeClient = new RTClient(
new URL(endpoint),
{ key: apiKey },
{ modelOrAgent: model, apiVersion: ‘2025-05-01-preview’ }
);
realtimeClient.configure({
instructions: SYSTEM_PROMPT,
voice: { name: ‘en-US-AvaNeural’, type: ‘azure-standard’, temperature: 0.7 },
turn_detection: { type: ‘server_vad’ },
input_audio_transcription: { model: ‘whisper-1’, language: ‘en-us’ },
modalities: [‘text’, ‘audio’],
input_audio_format: “pcm16”,
output_audio_format: “pcm16”,
input_audio_noise_reduction: {
type: “azure_deep_noise_suppression”
},
input_audio_echo_cancellation: {
type: “server_echo_cancellation”
},
});
Next Steps
The SDK and documentation along with a sample will be available in the next few weeks following this announcement, allowing you to build your own solutions using Azure Communication Services and Azure AI Voice Live API. To learn more about the Voice Live API and all its different capabilities, see Azure AI Blog.