Background

I was brought into the DataStage product team at the beginning of the journey to modernize DataStage, IBM’s industry-leading thick client data integration tool. There were four main goals set for the team as part of the modernization effort. First, make DataStage a software as a service (SaaS) offering, bringing both the interface and workloads to the cloud. Second, fully integrate the software into Cloud Pak for Data (CPD), IBM’s data and artificial intelligence (AI) platform, to enhance the platform’s capabilities. Third, improve usability and align it with IBM’s design language and design system. DataStage's initial release was in 1996 and much of the thick client experience included antiquated design paradigms. Finally, lower the barrier to entry for new users while still meeting the needs of experienced thick client users.

Key projects

During my time on the DataStage team, I had the opportunity to work on many aspects of the product. Below are a few of the most salient design projects that I led over the years.

Password protected projects. Contact me for access.

Intro to datastage

DataStage is a low-code, canvas-based, data integration tool used for creating data pipelines. The primary persona for DataStage is a data engineer. Data engineers work closely with business analysts and data analysts to create the requirements used to design data pipelines. When executed, data pipelines move and transform data to be utilized for data analytics, for developing machine learning models, and for other business critical applications.

Data pipeline

A data pipeline’s key purpose is to extract data from one or more sources, apply data transformations, and load the data into one or many target systems.

DataStage pipeline

Pipelines in DataStage are represented as flow diagrams.

Datastage pipeline anatomy

Flows are presented as a series of nodes connected together by links. Nodes are divided into two main categories: connectors and transformations. Connector nodes provide ways of accessing different data sources and targets. Transformation nodes are used to manipulate data in a multitude of ways.