Deploy Machine Learning Models the Smart Way with Azure Blob & Web App
June 27, 2025Security Review for Microsoft Edge version 138
June 27, 2025- This Article is Authored By Jonas De Keuster from VaultSpeed and Co-authored with Michael Olschimke, co-founder and CEO at Scalefree International GmbH & Trung Ta is a senior BI consultant at Scalefree International GmbH.
- The Technical Review is done by Ian Clarke, Naveed Hussain – GBBs (Cloud Scale Analytics) for EMEA at Microsoft
Businesses often struggle to align their understanding of processes and products across disparate systems in corporate operations. In our previous blogs in this series, we explored the advantages of Data Vault as a methodology and why it is increasingly recognized due to its automation-friendly approach to modern data warehousing. Data Vault’s modular structure, scalability, and flexibility address the challenges of integrating diverse and evolving data sources. However, the key to successfully implementing a Data Vault lies in automation.
Data Vault’s pattern-based modeling – organized around hubs, links, and satellites – provides a standardized framework well-suited to integrate data from horizontally scattered operational source systems. Automation tools like VaultSpeed enhance this methodology by simplifying the generation of Data Vault structures, streamlining workflows, and enabling rapid delivery of analytics-ready data solutions. By leveraging the strengths of Data Vault and VaultSpeed’s automation capabilities, organizations can overcome inefficiencies in traditional ETL processes, enabling scalable and adaptable data integration. Examples of such operational systems include Microsoft Dynamics 365 for CRM and ERP, SAP for enterprise resource planning, or Salesforce for customer data. Attempts to harmonize this complexity historically relied on pre-built industry data models. However, these models often fell short, requiring significant customization and failing to accommodate unique business processes.
Different approaches to Data Integration
Industry data models offer a standardized way to structure data, providing a head start for organizations with well-aligned business processes. They work well in stable, regulated environments where consistency is key.
However, for organizations dealing with diverse sources and fast-changing requirements, Data Vault offers greater flexibility. Its modular, scalable approach supports evolving data landscapes without the need to reshape existing models.
Both approaches aim to streamline integration. Data Vault simply offers more adaptability when complexity and change are the norm. So it depends on the use cases when it comes to choosing the right approach.
Tackling data complexity with automation
Integrating data from horizontally distributed sources is one of the biggest challenges data engineers face. VaultSpeed addresses this by connecting the physical metadata from source systems with the business’s conceptual data model and creating a “town plan” for building a Data Vault model. This “town plan” for Data Vault model construction serves as the bedrock for automating various data pipeline stages.
By aligning data’s technical and business perspectives, VaultSpeed enables the automated generation of logical and physical data models. This automation streamlines the design process and ensures consistency between the data’s conceptual understanding and physical implementation. Furthermore, VaultSpeed’s automation extends to the generation of transformation code. This code converts data from its source format into the structure defined by the Data Vault model. Automating this process reduces the potential for errors and accelerates the development of the data integration pipeline.
In addition to data models and transformation code, VaultSpeed also automates workflow orchestration. This involves defining and managing the tasks required to extract, transform, and load data into the Data Vault. By automating this orchestration, VaultSpeed ensures that the data integration process is executed reliably and efficiently.
How VaultSpeed automates integration
The following section will examine the detailed steps involved in the VaultSpeed workflow. We will examine how it combines metadata-driven and data-driven modeling approaches to streamline data integration and automate various data pipeline stages.
- Harvest metadata: VaultSpeed collects metadata from source systems such as OneLake, AzureSQL, SAP, and Dynamics 365, capturing schema details, relationships, and dependencies.
- Align with conceptual models: Using a business’s conceptual data model as a guiding framework, VaultSpeed ensures that physical source metadata is mapped consistently to the target Data Vault structure.
- Generate logical and physical models: VaultSpeed leverages its metadata repository and automation templates to produce fully defined logical and physical Data Vault models, including hubs, links, and satellites.
- Automate code creation: Once the models are defined, VaultSpeed generates the necessary transformation and workflow code using templates with embedded standards and conventions for Data Vault implementation. This ensures seamless data ingestion, integration, and consistent population of the Data Vault model.
By automating these steps, VaultSpeed eliminates the manual effort required for traditional data modeling and integration, reducing errors and addressing the inefficiencies of data integration using traditional ETL. Due to the model driven approach, the code is always in sync with the data model.
Unified integration with Microsoft Fabric
Microsoft Fabric offers a robust data ingestion, storage, and analytics ecosystem. VaultSpeed seamlessly embeds within this ecosystem to ensure an efficient and automated data pipeline. Here’s how the process works:
Ingestion (Extract and Load): Tools like ADF, Fivetran, or OneLake replication bring data from various sources into Fabric. These tools handle the extraction and replication of raw data from operational systems. Microsoft Fabric also supports mirrored databases, enabling real-time data replication from sources like CosmosDB, Azure SQL, or application data into the Fabric environment. This ensures data remains synchronized across the ecosystem, providing a consistent foundation for downstream modeling and analytics.
Data Repository or Platform: Microsoft Fabric is the data platform providing the infrastructure for storing, managing, and securing the ingested data. Fabric uniquely supports warehouse and lakehouse experiences, bringing them together under a unified data architecture. This means organizations can combine structured, transactional data with unstructured or semi-structured data in a single platform, eliminating silos and enabling broader analytics use cases.
Modeling and Transformation: VaultSpeed takes over at this stage, leveraging its advanced automation to model and transform data into a Data Vault structure. This includes creating hubs, links, and satellites while ensuring alignment with business taxonomies. Unlike traditional ETL tools, VaultSpeed is not involved in the runtime execution of transformations. Instead, it generates code that runs within Microsoft Fabric. This approach ensures better performance, reduces vendor lock-in, and enhances security since no data flows through VaultSpeed itself. By focusing exclusively on model-driven automation, VaultSpeed enables organizations to maintain full control over their data processing while benefiting from automation efficiencies. Additionally, Fabric’s VertiPaq engine manages the transformation workloads automatically, ensuring optimal performance without requiring extensive manual tuning, a key capability in a Data Vault context where performance is critical for handling large volumes of data and complex transformations. This simplifies operations for data engineers and ensures that query performance remains efficient, even as data volumes and complexity grow.
Consume: The integrated data layer within Microsoft Fabric serves multiple consumption paths. While tools like Power BI enable actionable insights through analytics dashboards, the same data foundation can also drive AI use cases, such as machine learning models or intelligent applications.
By connecting ingestion tools, a unified data platform, and analytics or AI solutions, VaultSpeed ensures a streamlined and integrated workflow that maximizes the value of the Microsoft Fabric ecosystem.
Loading at multiple speeds: real-time Data Vaults with Fabric
Loading data into a Data Vault often requires balancing traditional batch processes with the demands of real-time ingestion within a unified model. Microsoft Fabric’s event-driven tools, such as Data Activator, empower organizations to process data streams in real-time while supporting traditional batch loads. VaultSpeed complements these capabilities by ensuring that both modes of ingestion feed seamlessly into the same Data Vault model, eliminating the need for separate architectures like the Lambda pattern.
Key capabilities for real time Data Vault include:
- Event-driven updates: Automatically trigger incremental loads into the Data Vault when changes occur in CosmosDB, OneLake, or other sources.
- Automated workflow orchestration: VaultSpeed’s Flow Management Control (FMC) automates the entire data ingestion, transformation, and loading workflow. This includes handling delta detection, incremental updates, and batch processes, ensuring optimal efficiency regardless of the speed of data arrival. FMC integrates natively with Azure Data Factory (ADF) for seamless orchestration within the Microsoft ecosystem. For more complex or distributed workflows, FMC also supports Apache Airflow, enabling flexibility in managing a wide range of data pipelines.
- Seamless integration: Maintain synchronized pipelines for historical and real-time data within the Fabric environment. The FMC intelligently manages multiple data streams, dynamically adjusting to workload demands to support high-volume batch loads and real-time event-driven updates.
These capabilities ensure analytics dashboards reflect the latest data, delivering immediate value to decision-makers.
Automating the gold layer and delivering data products at scale
Power BI is a cornerstone of Microsoft Fabric, and VaultSpeed makes it easier for data modelers to connect the dots. By automating the creation of the gold layer, VaultSpeed enables seamless integration between Data Vaults and Power BI.
Benefits for data teams:
- Automated gold layer: VaultSpeed automates the creation of the gold layer, including templates for star schemas, One Big Table (OBT), and other analytics-ready structures. These automated templates allow businesses to generate consistent and scalable presentation layers without manual intervention.
- Accelerated time to insight: By reducing manual preparation steps, VaultSpeed enables teams to deliver dashboards and reports quickly, ensuring faster access to actionable insights.
- Deliver data products: The ability to automate and standardize star schemas and other presentation models empowers organizations to deliver analytics-ready data products at scale, efficiently meeting the needs of multiple business domains.
- Improved data governance: VaultSpeed’s lineage tracking ensures compliance and transparency, providing full traceability from raw data to the presentation layer.
- No-code automation: Eliminate the need for custom scripting, freeing up time to focus on innovation and higher-value tasks.
Conclusion
Integrating VaultSpeed and Microsoft Fabric redefines how data modelers and engineers approach Data Vault 2.0. This partnership unlocks the full potential of modern data ecosystems by automating workflows, enabling real-time insights, and streamlining analytics. If you’re ready to transform your data workflows, VaultSpeed and Microsoft Fabric provide the tools you need to succeed. The following article will focus on the DataOps part of automation.
Further reading
- Automating common understanding: Integrating different data source views into one comprehensive perspective
- Why Data Vault is the best model for data warehouse automation: Read the eBook
- The Elephant in the Fridge by John Giles: A great reference on conceptual data modeling for Data Vault
About VaultSpeed
VaultSpeed empowers enterprises to deliver data products at scale through advanced automation for modern data ecosystems, including data lakehouse, data mesh, and fabric architectures. The no-code platform eliminates nearly all traditional ETL tasks, delivering significant improvements in automation across areas like data modeling, engineering, testing, and deployment.
With seamless integration to platforms like Microsoft Fabric or Databricks, VaultSpeed enables organizations to automate the entire software development lifecycle for data products, accelerating delivery from design to deployment. VaultSpeed addresses inefficiencies in traditional data processes, transforming how data engineers and business users collaborate to build flexible, scalable data foundations for AI and analytics.
About the Authors
Jonas De Keuster is VP Product at VaultSpeed. He had close to 10 years of experience as a DWH consultant in various industries like banking, insurance, healthcare, and HR services, before joining the data automation vendor. This background allows him to help understand current customer needs and engage in conversations with members of the data industry.
Michael Olschimke is co-founder and CEO of Scalefree International GmbH, a European Big Data consulting firm. The firm empowers clients across all industries to use Data Vault 2.0 and similar Big Data solutions. Michael has trained thousands of industry data warehousing professionals, taught academic classes, and published regularly on these topics.
Trung Ta is a senior BI consultant at Scalefree International GmbH. With over 7 years of experience in data warehousing and BI, he has advised Scalefree’s clients in different industries (banking, insurance, government, etc.) and of various sizes in establishing and maintaining their data architectures. Trung’s expertise lies within Data Vault 2.0 architecture, modeling, and implementation, specifically focusing on data automation tools.