Part 1 – Develop a VS Code Extension for Your Capstone Project
May 7, 2025How to Set Up Data Collection Rules (DCR) for Azure Kubernetes Service (AKS)
May 7, 2025Introduction
Ensuring continuous availability and data integrity is paramount for organizations. This article focuses exclusively on resiliency within Microsoft Fabric, covering high availability (HA), disaster recovery (DR), and data protection strategies. We will explore Microsoft Fabric’s resiliency features, including Recovery Point Objective (RPO) and Recovery Time Objective (RTO), and outline mechanisms for recovering from failures in both pipeline and streaming scenarios.
As of April 25, 2025, this information reflects the current capabilities of Microsoft Fabric. Because features evolve rapidly, consult the Microsoft Fabric roadmap for the latest updates.
Service Resiliency in Microsoft Fabric
Microsoft Fabric leverages Azure’s infrastructure to ensure continuous service availability during hardware or software failures.
Availability Zones
Fabric uses Azure Availability Zones—physically separate datacenters within an Azure region—to automatically replicate resources across zones. This enables seamless failover during a zone outage, without manual intervention. As of Q1 2025, Fabric provides partial support for zone redundancy in selected regions and services. Customers should refer to service-specific documentation for detailed HA guarantees.
Cross‑Region Disaster Recovery
For protection against regional failures, Microsoft Fabric offers partial support for cross-region disaster recovery. The level of support varies by service:
- OneLake Data: OneLake supports cross-region data replication in selected regions. Organizations can enable or disable this feature based on their business needs. For more information, see Disaster recovery and data protection for OneLake.
- Power BI: Power BI includes built-in DR capabilities, with automatic data replication across regions to ensure high availability. For frequently asked questions, review the Power BI high availability, failover, and disaster recovery FAQ.
Data Resiliency: RPO and RTO Considerations
Fabric offers configurable storage redundancy options—Locally Redundant Storage (LRS), Zone-Redundant Storage (ZRS), and Geo-Redundant Storage (GRS)—each with different RPO/RTO targets. Detailed definitions and SLAs are available in the Azure Storage redundancy documentation.
Recovering from Failed Processes
Failures can occur in both pipeline and streaming workloads. Microsoft Fabric provides tools and strategies for minimizing disruption.
Data Pipelines
In Data Factory within Fabric, pipelines are made up of activities that may fail due to source issues or transient network errors. Zone failures are typically handled like standard pipeline errors, while regional failures require manual intervention. See Microsoft Fabric disaster recovery experience specific guidance for a brief discussion.
Pipeline resiliency can be improved by implementing retry policies, configuring error-handling blocks, and monitoring execution status using Fabric’s built-in logging features.
Streaming Scenarios
- Spark Structured Streaming: Fabric leverages Apache Spark for real-time processing. Spark Structured Streaming includes built-in checkpointing, but seamless failover depends on cluster configuration. Manual intervention can be required to resume tasks after node or regional failures.
- Eventstream: Eventstream simplifies streaming data ingestion, but users should currently assume manual steps may be needed for fault recovery.
Monitoring and Alerting
Microsoft Fabric integrates with tools such as Azure Monitor and Microsoft Defender for Cloud, allowing administrators to track availability metrics and configure alerts. Regular monitoring helps detect anomalies early and ensures that resiliency strategies remain effective.
Data Loss Prevention (DLP)
As of March 2025, Microsoft Purview extends DLP policy enforcement to Fabric and Power BI workspaces. Organizations can define policies to automatically identify, monitor, and protect sensitive data across the Microsoft ecosystem. For more information, review Purview Data Loss Prevention.
Cost Considerations
Enhancing resiliency can increase costs. Key considerations include:
- Geo-Redundancy: While cross-region replication improves resiliency, it also increases storage and transfer costs. Assess which workloads require GRS based on criticality.
- Egress Charges: Transferring data across regions can generate egress fees. Co-locating compute and storage within the same region helps minimize these charges.
- Pipeline CU Consumption: Data movement and orchestration in Fabric consume Capacity Units (CUs). Regional data movement may take longer and result in higher CU usage. Understanding these costs helps optimize both performance and budget. For example, data movement between regions can take more time and therefore add additional cost.
Enabling Disaster Recovery for Fabric Capacities
Disaster recovery must be enabled per Fabric capacity. This can be configured through the Admin Portal. Make sure to enable DR for each capacity that requires protection. For setup details, learn how to Manage your Fabric capacity for DR.
Conclusion
Microsoft Fabric offers a robust set of features for building resilient data systems. By leveraging its high availability, disaster recovery, and monitoring capabilities—and aligning them with cost-aware planning—organizations can ensure operational continuity and safeguard critical data.
For ongoing updates, monitor the Microsoft Fabric documentation and consider subscribing to the Fabric blog for the latest announcements.