May 31, 2019

Strangling Legacy Workflows Using Airflow

Strangler pattern is a design pattern attributed1 to the prominent writer and software engineer, Martin Fowler. This approach was put into practice and was described in a paper by Chris Stevenson and Any Pols2. The main goal of this design pattern is to help with migration of a legacy system that is in major need of house cleaning or decommissioning but is difficult to do so. These days, this design pattern is mostly talked about in the context of migration from a monolithic architecture to microservices.

Cunningham’s Law: “the best way to get the right answer on the internet is not to ask a question; it’s to post the wrong answer”.

Disclaimer: Please post a comment if I am wrong on anything

Strangler Pattern Strangler Pattern (source: Microsoft Docs)

The idea is that you start building the replacement system while maintaining your existing one and what you do is gradually move over existing features and functionality and eventually you end up with the new system. This method spreads out the time and effort required to migrate to the new system. It is also quite less risky as it doesn’t require upfront diversion of attention and in essence the work blends in with day to day work.

SQL Server Integration Services is Dead

If you ask any data engineer (including myself) with few years of experience under their belt, who have worked in a Microsoft stack, about SQL Server Integration Services (SSIS), they would recommend alternative solutions or worse would talk negatively about it. It is a remnant technology from the early business intelligence and data warehousing era and Microsoft is no longer maintaining it. It was good for its time but now there are far better solutions out there. If you are new to data engineering, read this round-up, The Rise of the Data Engineer, by Maxime Beauchemin.

Airflow to the Rescue

Now this brings us back to the Strangler Pattern. Let’s say you want to dismantle Oozie or SSIS and move over your workflows to Airflow, this can’t happen overnight so we take the good lessons we have learned from Martin Fowler and gradually migrate to Airflow. At my current job, I have the exact scenario I have described so far and the approach I have taken is I created custom Airflow Operators and Hooks for interacting with our on-prem installation of SQL Server and SSIS.

The custom plugin I have created for Airflow consists of:

SQL Server Hook:

Kerberized hook that relies on Kerberos tickets for authorization. The existing SQL Server Hook that ships with Airlfow uses pymssql is non-kerberized and only works with a username and password.

SSIS Package Operator:

This operator is responsible for executing SSIS packages and returning execution IDs

SSIS Package Sensor:

This sensor is responsible for polling SSIS with the execution ID received from the SSIS Package Operator and reporting back with results

The plugin can be found on my Github repository: mahdi-hosseini/ssis_validator

The Gameplan

Here is the approach I am taking:

Step 1. Migrate packages one by one to Airflow and schedule them to run as DAGs

Step 2. Monitor the DAGs and build confidence in correctness, timeliness, and recovery

Step 3. Refactor the DAGs and move over the SSIS Package workflows to Airflow

Step 4. Perform step 2 again

Step 5. Decommission SSIS Package

Now depending on how many packages you have, in a reasonable time frame, you can move off of any ancient workflow engine or scheduler engine you might be using.

  1. StranglerFigApplication ↩︎

  2. An Agile Approach to a Legacy System ↩︎

© Mike Hosseini 2019