Live with Restream, May 09
May 10, 2025Hacking the migration service in Azure Database for PostgreSQL
May 10, 2025Introduction
Mainframe/Midrange data is often stored in fixed-length format, where each record has a predetermined length, or variable-length format, where each record’s length may vary. The data is stored in binary format, using Extended Binary Coded Decimal Interchange Code (EBCDIC) encoding and the metadata for the EBCDIC files is stored in a copybook file. These EBCDIC encoded files store data uniquely based on its data type, which is vital Mainframe file system data optimal storage and performance.
However, this presents a challenge when migrating data from Mainframe or Midrange systems to distributed systems. The data, originally stored in a format specific to Mainframe or Midrange systems, is not directly readable upon transfer to distributed systems. As distributed systems only understand code pages like American Standard Code for Information Interchange (ASCII)
To make this data readable on a distributed system, we would need to do an EBCDIC to ASCII code page conversion. This conversion can be achieved in many ways. Few of them are
- Microsoft Host Integration Server, Host File client
- Logic app IBM host File connector. Our detailed blog about it is here.
- Open Source (OSS) Libraries.
- Third-party ISV solutions.
Microsoft Host Intergration server
Microsoft Host Integration server (HIS) has a component named Host File Client (HFC). This particular component helps in converting Mainframe EBCDIC files to ASCII using a custom developed C# solution. More details on this solution is provided in HIS documentation page.
Logic App Converter.
If you prefer to choose a cloud native solution, then you can try to use the Host File Connector in Azure Logic Apps. The detailed process has been documented in this blog post.
Fabric (with Open-Source Libraries)
Microsoft Fabric is an enterprise-ready, end-to-end analytics platform. It unifies data movement, data processing, ingestion, transformation, real-time event routing, and report building. It supports these capabilities with integrated services like Data Engineering, Data Factory, Data Science, Real-Time Intelligence, Data Warehouse, and Databases.
There are many open-source solutions which can help in achieving conversion of mainframe data to ASCII. This will help in converting files using Fabric, Databricks, Synapse on Azure.
This blog will focus on the OSS option.
Data Ingestion Architecture
Using OSS on Fabric
There are multiple Open-source libraries that can be utilized for this data conversion. In this article we will dive deeper into one of these solutions – Cobrix
COBRIX is an open-source library built using scala and leverages the multithreaded process powered framework of spark. This helps in converting the file faster than compared to other single threaded processes. As this is multithreaded, it will need a pool of compute resources to achieve the conversion. Cobrix can run on spark environments like Azure Synapse, Databricks and Microsoft Fabric. We will dive deeper into how we can set up Cobrix on Microsoft Fabric.
- Download required Cobrix packages
- We will have to first download the required Cobrix packages from the right sources. As Fabric has a particular runtime dependency, please make sure your download the right build for Scala as per the fabric environment that you setup. You will have to download two jars named Cobol-Parser_xx.xx.jar and Spark-cobol_xxx.xx.jar.
- Setup the Fabric Environment.
- Login to fabric.microsoft.com.
- Create a Fabric workspace
- Create an Environment in the workspace
- Open the Environment and click on custom Libraries.
- Upload the two jars which were downloaded earlier.Once you have uploaded your custom library setup should look something like this.
- Create a new Lakehouse. Upload the cobol copybook file as well as the Mainframe Datafile in Binary to a particular location in the lakehouse. At the end of this step your lakehouse setup should look something of this kind.
- For both these files, copy the Azure Blob File System Secure (ABFSS) path by right clicking on the files. This link can be used to point to the file from the Spark notebook.
- Create a new Fabric pipeline.
- This pipeline will have two components, the first component will be a notebook, which will call the Cobrix framework to convert the file from EBCDIC to ASCII. Second piece of it will be a copy activity to copy the contents of the output file created in the notebook to a SQL DB on Fabric.
- Create a new Notebook . Attach the environment which you had created earlier to this notebook.
In the notebook cell, use can use this piece of code.
//Blob access var CopyBookName = “abfss://file1.cpy” var DataFileName = “abfss://file1.dat” var outputFileName = “abfss://output.txt” //Cobrix Converter Execution val cobolDataframe = spark .read .format(“za.co.absa.cobrix.spark.cobol.source”) .option(“copybook”, CopyBookName) .load(DataFileName) //Display DataFrame to view conversion results cobolDataframe.printSchema() cobolDataframe.show()
- Once you have set the configuration properly, you are all set to run the notebook. And this will convert the file from EBCDIC to ASCII and store it to the Lakehouse.
- Add a Copy activity to the pipeline with File as Source and SQL server as destination.
- At this point in time, your pipeline should look something like this
- Once you run this pipeline, the Mainframe EBCDIC file will be converted to ASCII and then loaded into Fabric Native SQL DB table.
Third-party ISV solutions.
- There are many third-party ISV solutions which are available for EBCDIC to ASCII conversions. Please get int touch with us to help you get the right solution for your requirements.
Summary
EBCDIC to ASCII conversion is a critical piece of work during the data migration/modernization journey. Being able to do this with ease and accuracy will drive the success of data migration. With this feature enabled in fabric, this opens up a new set of use cases like Mainframe report generation etc kind of use cases which are predominantly data warehouse driven.
Feedback and suggestions
If you have feedback or suggestions for improving this data migration asset, please send an email to Database Platform Engineering Team.