Partner Blog | Mark your calendars: Microsoft Partner of the Year Awards nominations open June 18!

May 15, 2025

Public Preview: Hibernated VMs in Standby Pools for Virtual Machine Scale Sets

May 15, 2025

Published by azurefeeds on May 15, 2025

Folder Structure

Azure Databricks/
├── architecture-diagram.drawio
├── readme.md
├── databricks-infra/ # Infra: pools, clusters, secret scopes
│ ├── main.tf
│ ├── variables.tf
│ ├── dev/
│ └── prd/
├── databricks-uc-data-assets/ # Unity Catalog: catalogs, schemas, volumes
│ ├── main.tf
│ ├── variables.tf
│ ├── dev/
│ └── prd/
├── databricks-uc-external-locations/# External locations, storage credentials
│ ├── main.tf
│ ├── variables.tf
│ ├── dev/
│ └── prd/
├── modules/ # Reusable Terraform modules
│ ├── infra-assets/
│ ├── uc-data-assets/
│ └── uc-external-locations/
└── Pipelines/ # Azure DevOps YAML pipelines & templates
├── databricks-infra-deploy-main.yaml
├── databricks-unity-catalog-deploy-main.yaml
├── databricks-external-locations-deploy-main.yaml
└── Templates/

Process Details

1. Infrastructure Provisioning

Instance Pools: Defined in instance_pools variable, created via modules/infra-assets.

Shared Clusters: Configured in databricks_shared_clusters variable, supporting autoscaling, node types, and security modes.

Secret Scopes: Integrated with Azure Key Vault for secure secret management.

2. Unity Catalog Data Assets

Catalogs: Created for different purposes (e.g., sa, cdh, ws) with specific owners, storage roots, and grants.

Schemas & Volumes: Defined per catalog, supporting custom properties, storage locations, and fine-grained permissions.

3. External Locations

Storage Credentials: Managed via Azure Managed Identity Access Connectors.

External Locations: Configured for each data layer (catalog, bronze, silver, gold, landing zones, etc.), with read/write and validation options.

Grants: Fine-grained access control for each external location.

4. CI/CD Automation

Pipelines: YAML files in Pipelines automate plan/apply for each environment and component.

Templates: Reusable pipeline templates for artifacts, plan, and apply stages.

Artifact Management: Build artifacts are published and consumed by deployment jobs.

Connection Mechanism

Authentication is handled securely and automatically via Azure DevOps and Key Vault:

1. AzureRM Provider Authentication

Purpose: Allows Terraform to provision resources in your Azure subscription.

How:

Uses Service Principal credentials (ARM_CLIENT_ID, ARM_CLIENT_SECRET, ARM_TENANT_ID, ARM_SUBSCRIPTION_ID) fetched from Azure Key Vault. These are injected as environment variables in the pipeline and referenced in provider blocks. See databricks-infra/main.tf and Pipelines/Templates/databricks-infra-plan-template.yaml.

2. Databricks Provider Authentication

Purpose: Allows Terraform to manage Databricks resources (clusters, pools, Unity Catalog, etc.) in your workspace.

How:

Uses the Databricks workspace host and Azure resource ID (constructed from variables in dev.tfvars). Authenticates via the same Service Principal, leveraging Azure AD integration. See databricks-infra/main.tf, databricks-uc-data-assets/main.tf, and databricks-uc-external- locations/main.tf.

3. Key Vault Integration

Purpose: Securely manage secrets (like passwords, keys) for Databricks secret scopes.

How:

Secret scopes in Databricks are linked to Azure Key Vault for secure secret management. key_vault_name, scope_name, and key_vault_resource_group are used to configure this linkage in Terraform modules. See modules/infra-assets/main.tf.

4. Remote State

Purpose: Store Terraform state securely in Azure Storage.

How:

Defined in backend config files such as dev_backend.conf and prd_backend.conf in each environment folder. See databricks-infra/dev/dev_backend.conf.

5. Pipeline Secret Management

Purpose: Automate the secure injection of credentials into pipeline jobs.

How:

Azure DevOps tasks fetch secrets from Azure Key Vault at runtime. Secrets are set as environment variables for Terraform commands. See Pipelines/Templates/databricks-infra-plan-template.yaml and similar templates.

Authentication-Related Files

Files and Descriptions

databricks-infra/{$env}/{$env}.tfvars
- Contains environment-specific Azure and Databricks identifiers, including:
  - az_subscription_id
  - tenant_id
  - databricks_workspace_name
  - databricks_workspace_host
  - key_vault_name
  - scope_name

databricks-infra/main.tf
- Configures the AzureRM and Databricks providers using variables and environment variables injected by the pipeline.

modules/infra-assets/main.tf
- Creates Databricks secret scopes linked to Azure Key Vault.

Pipelines/Templates/databricks-infra-plan-template.yaml
- Fetches secrets from Key Vault and sets them as environment variables for Terraform.

databricks-infra/dev/dev_backend.conf
- Configures remote backend for Terraform state in Azure Storage.

databricks-uc-data-assets/main.tf
- Uses the same authentication mechanism for Databricks and Azure.

databricks-uc-external-locations/main.tf
- Uses the same authentication mechanism for Databricks and Azure.

Databricks Workspace

Workspace Host: Provided via databricks_workspace_host variable.

Workspace Resource ID: Constructed from subscription, resource group, and workspace name.

Provider Aliasing: Ensures correct context for Databricks API calls.

Remote State

Terraform State: Stored in Azure Storage Account, configured via /_backend.conf files.

Required Details for Successful Deployment

Azure Subscription ID: For resource provisioning.

Resource Group: Where Databricks and supporting resources reside.

Databricks Workspace Name & Host: For API and provider configuration.

Tenant ID: Azure Active Directory tenant for authentication.

Access Connector Name: For managed identity storage credentials.

Key Vault Name & Resource Group: For secret scope integration.

Storage Account Names: For each data layer (catalog, bronze, silver, gold, landing, etc.).

Metastore ID: For Unity Catalog operations.

Owners and Grants: Email addresses or group names for resource ownership and permissions.

Pipeline Service Connection: Azure DevOps service connection with sufficient permissions.

How to Deploy

1. Prerequisites

Azure CLI installed and authenticated (az login)

Azure DevOps project with pipeline agent pool

Service Principal with contributor access

Azure Key Vault with required secrets

2. Configure Environment

Edit the relevant dev.tfvars or prd.tfvars files with your environment details.

Ensure backend config files (dev_backend.conf, prd_backend.conf) point to the correct storage account and container.

3. Run Pipelines

Trigger the desired pipeline in Azure DevOps (plan/apply for dev or prd).

Pipelines will:
- Download artifacts
- Fetch secrets from Key Vault
- Run terraform init, plan, and apply for each component

4. Manual Terraform (Optional)

You can also run Terraform manually:

az login export ARM_ACCESS_KEY= terraform init -backend-config=”dev/dev_backend.conf” -reconfigure terraform plan -var-file=”dev/dev.tfvars” -out=plan/dev_plan terraform apply “plan/dev_plan”

Additional Notes

State Migration: Always migrate any existing state before generating or applying a plan to avoid resource conflicts or unintentional deletions.

Modularity: Each major component (infra, data assets, external locations) is modular and can be deployed independently.

Security: All sensitive values are managed via Azure Key Vault and not hardcoded.

References

Authenticate Terraform to Azure

Configure Databricks Terraform Provider Authentication

Integrate Azure Key Vault with Azure Databricks

Store Terraform state in Azure Storage

Terraform Azure Provider Docs

Databricks Terraform Provider Docs

Azure DevOps Pipelines

Partner Blog | Mark your calendars: Microsoft Partner of the Year Awards nominations open June 18!

Public Preview: Hibernated VMs in Standby Pools for Virtual Machine Scale Sets

Partner Blog | Mark your calendars: Microsoft Partner of the Year Awards nominations open June 18!

Public Preview: Hibernated VMs in Standby Pools for Virtual Machine Scale Sets

Folder Structure

Process Details

1. Infrastructure Provisioning

2. Unity Catalog Data Assets

3. External Locations

4. CI/CD Automation

Connection Mechanism

Authentication-Related Files

Files and Descriptions

Databricks Workspace

Remote State

Required Details for Successful Deployment

How to Deploy

1. Prerequisites

2. Configure Environment

3. Run Pipelines

4. Manual Terraform (Optional)

Additional Notes

References

Related posts

Partner Blog | Mark your calendars: Microsoft Partner of the Year Awards nominations open June 18!

First Fridays June – An insider’s view: Global Executive & Employee Communications at Microsoft

Now Available: Degraded Mode Health State for Standby Pools