[Launched] Generally Available: Azure Logic Apps Rules Engine
May 23, 2025At Microsoft Build, we introduced the Healthcare Agent Orchestrator, now available in Azure AI Foundry Agent Catalog . In this blog, we unpack the science: how we structured the architecture, curated real tumor board data, and built robust agent coordination that brings AI into real healthcare workflows.
Introduction
Healthcare is inherently collaborative. Critical decisions often require input from multiple specialists—radiologists, pathologists, oncologists, and geneticists—working together to deliver the best outcomes for patients.
Yet most AI systems today are designed around narrow tasks or single-agent architectures, failing to reflect the real-world teamwork that defines healthcare practice.
That’s why we developed the Healthcare Agent Orchestrator: an orchestrator and code sample built around Microsoft’s industry-leading healthcare AI models, designed to support reasoning and multidisciplinary collaboration — enabling modular, interpretable AI workflows that mirror how healthcare teams actually work.
The orchestrator brings together Microsoft healthcare AI models—such as MedImageParse for image recognition, CXRReportGen for automated radiology reporting, and MedImageInsight for retrieval and similarity analysis—into a unified, task-aware system that reflects real-word healthcare decision making pattern.
Healthcare Is Naturally Multi-Agent
Healthcare decision-making often requires synthesizing diverse data types—radiologic images, pathology slides, genetic markers, and unstructured clinical narratives—while reconciling differing expert perspectives.
In a molecular tumor board, for instance, a radiologist might highlight a suspicious lesion on CT imaging, a pathologist may flag discordant biopsy findings, and a geneticist could identify a mutation pointing toward an alternate treatment path.
Effective collaboration in these settings hinges not on isolated analysis, but on structured dialogue—where evidence is surfaced, assumptions are challenged, and hypotheses are iteratively refined.
To support the development of Healthcare Agent Orchestrator, we worked with a leading healthcare provider organization to help curate a proprietary dataset comprising longitudinal patient records and real tumor board transcripts—capturing the complexity of multidisciplinary discussions. We applied LLM-based structuring techniques to convert free-form transcripts into interpretable units, followed by expert review to ensure domain fidelity and relevance. This dataset provides a critical foundation for assessing agent coordination, reasoning handoffs, and task alignment in simulated collaborative settings.
Single-agent AI models are not well-suited to replicate this kind of dynamic, multi-perspective team-based reasoning—underscoring the need for multi-agent, domain-specialized frameworks.
Why General-Purpose LLMs Fall Short for Healthcare Collaboration
While general-purpose large language models have delivered remarkable results in many domains, they face key limitations in high-stakes healthcare environments:
- Precision is critical: Even small hallucinations or inconsistencies can compromise safety and decision quality
- Multi-modal integration is required: Many healthcare decisions involve interpreting and correlating diverse data types—images, reports, structured records—much of which is not available in public training sets
- Transparency and traceability matter: Users must understand how conclusions are formed and be able to audit intermediate steps
The Healthcare Agent Orchestrator addresses these challenges by pairing general reasoning capabilities with specialized agents that operate over imaging, genomics, and structured EHRs—ensuring grounded, explainable results aligned with clinical expectations. Each agent contributes domain-specific expertise, while the orchestrator ensures coherence, oversight, and explainability—resulting in outputs that are both grounded and verifiable.
Architecture: Coordinating Specialists Through Orchestration
Healthcare Agent Orchestrator’s multi-agent framework is built on modular AI infrastructure, designed for secure, scalable collaboration:
- Semantic Kernel: a lightweight open-source SDK from Microsoft for building AI agents and integrating LLMs into production code. It enables agent orchestration—including dynamic tool invocation, contextual task planning, and adaptive dialogue management—across complex clinical workflows.
- Model Context Protocol (MCP): an open protocol integrated into the orchestrator to securely connect AI models with structured clinical data (e.g., FHIR-based EHRs) and conversational interfaces. MCP facilitates context-aware prompting, traceable reasoning, and privacy-preserving access to patient-specific information.
- Magentic-One: Microsoft’s generalist multi-agent system built on AutoGen, enabling task decomposition, role-based collaboration, and shared memory through structured message passing and centralized context.
Each agent is orchestrated within the system and integrated via Semantic Kernel’s group chat infrastructure, with support for secure communication and modular deployment via Azure.
This orchestration ensures that each model—whether interpreting a lung nodule, analyzing a biopsy image, or summarizing a genomic variant—is applied precisely where its expertise is most relevant, without overloading a single system with every task.
The modularity of the framework also future-proofs: as new health AI models and tools emerge, they can be seamlessly incorporated into the ecosystem without disrupting existing workflows—enabling continuous innovation while maintaining clinical stability.
Microsoft’s healthcare AI models at the Core
Healthcare Agent Orchestrator leverages Microsoft’s latest healthcare AI models:
- CXRReportGen: Integrates multimodal inputs—including current and prior X-ray images and report context—to generate grounded, interpretable radiology reports. The model has shown improved accuracy and transparency in automated chest X-ray interpretation, evaluated on both public and private data.
- MedImageParse3: A biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition across 9 imaging modalities.
- MedImageInsight4: Facilitates fast retrieval of clinically similar cases, supports disease classification across broad range of medical image modalities, accelerating second opinion generation and diagnostic review workflows.
Each model acts as a specialized agent within the system, contributing focused expertise while allowing flexible, context-aware collaboration orchestrated at the system level. CXRReportGen is included in the initial release and supports grounded radiology report generation. Other Microsoft healthcare models such as MedImageParse and MedImageInsight are being explored in internal prototypes to expand the orchestrator’s capabilities across segmentation, detection, and image retrieval tasks.
Seamless Integration with Microsoft Teams
Rather than creating new silos, Healthcare Agent Orchestrator integrates directly into the tools clinicians already use—specifically Microsoft Teams.
Developers are investigating how clinicians can engage with agents through natural conversation, asking questions, requesting second opinions, or cross-validating findings—all without leaving their primary collaboration environment.
This approach minimizes friction, improves user experience, and brings cutting-edge AI into real-world care settings.
Building Toward Robust, Trustworthy Multi-Agent Collaboration
Think of the orchestrator as managing a secure, structured group chat. Each participant is a specialized AI agent—such as a ‘Radiology’ agent, ‘PatientHistory’ agent, or ‘ClinicalTrials‘ agent. At the center is the ‘Orchestrator’ agent, which moderates the interaction: assigning tasks, maintaining shared context, and resolving conflicting outputs. Agents can also communicate directly with one another, exchanging intermediate results or clarifying inputs. Meanwhile, the user (potentially, a clinician) can engage either with the orchestrator or with specific agents as needed.
Each agent is configured with instructions (the system prompt that guides its reasoning), and a description (used by both the UI and the orchestrator to determine when the agent should be activated). For example, the Radiology agent is paired with the cxr_report_gen tool, which wraps Microsoft’s CXRReportGen model for generating findings from chest X-ray images. Tools like this are declared under the agent’s tools field and allow it to call foundation models or other capabilities on demand—such as the clinical_trials tool5 for querying ClinicalTrials.gov. Only one agent is marked as facilitator, designating it as the moderator of the conversation; in this scenario, the Orchestrator agent fills that role.
Early observations highlight that multi-agent orchestration introduces new complexities—even as it improves specialization and task alignment. To address these emergent challenges, we are actively evolving the framework across several dimensions:
- Mitigating Error Propagation Across Agents:
Ensuring that early-stage errors by one agent (e.g., misinterpretation of an image) do not cascade unchecked through subsequent reasoning steps. This includes introducing critical checkpoints where outputs from key agents are verified before being consumed by others. - Optimizing Agent Selection and Specialization:
Recognizing that more agents are not always better. Adding unnecessary or redundant agents can introduce noise and confusion. We’ve implemented a systematic framework that emphasizes a few highly suited agents per task —dynamically selected based on case complexity and domain needs—while continuously tracking performance gains and catching regressions early. - Improving Transparency and Hand-off Clarity:
Structuring agent interactions to make intermediate outputs and rationales visible, enabling clinicians (and the system itself) to trace how conclusions were reached, catch inconsistencies early, and intervene when necessary.
Adapting General Frameworks for Healthcare Complexity
Generic orchestration frameworks like Semantic Kernel provide a strong foundation—but healthcare demands more. The stakes are higher, the data more nuanced, and the workflows require precision, traceability, and regulatory compliance.
Here’s how we’ve extended and adapted these systems to help address healthcare demands:
- Precision and Safety: We introduced domain-aware verification checkpoints and task-specific agent constraints to prevent inappropriate tool usage—supporting more reliable reasoning. To uphold the high standards required in healthcare, we defined two complementary metric systems (Check Healthcare Agent Orchestrator Evaluation for more details):
- Core Metrics: monitor health agents selection accuracy, intent resolution, contextual relevance, and information aggregation
- RoughMetric: a composite score based on ROUGE that helps quantify the precision of generated outputs and conversation reliability.
- TBFact: A modified version of RadFact2 that measures factuality of claims in agents’ messages and helps identifying omissions and hallucination
- Domain-Specific Tool Planning: Healthcare agents must reason across multimodal inputs—such as chest X-rays, CT slices, pathology images, and structured EHRs. We’ve customized Semantic Kernel’s tool invocation and planning modules to reflect clinical workflows, not generic task chains.
- Secure and Compliant Data Access: Through MCP and Azure infrastructure, we enable agents to operate on patient-specific data while preserving strict privacy controls and aligning with institutional policies.
These infrastructure-level adaptations are designed to complement Microsoft Healthcare AI models—such as CXRReportGen, MedImageParse, and MedImageInsight—working together to enable coordinated, domain-aware reasoning across complex healthcare tasks.
Enabling Collaborative, Trustworthy AI in Healthcare
Healthcare demands AI systems that are as collaborative, adaptive, and trustworthy as the clinical teams they aim to support.
The Healthcare Agent Orchestrator is a concrete step toward that vision—pairing specialized health AI models with a flexible, multi-agent coordination framework, purpose-built to reflect the complexity of real clinical decision-making.
By aligning with existing healthcare workflows and enabling transparent, role-specific collaboration, this system shows promise to empower clinicians to work more effectively—with AI as a partner, not a replacement.
2 arXiv, MAIRA-2: Grounded Radiology Report Generation, June 6, 2024