Skip to main content

SQL Server Integration Services (SSIS) - Automatic Data Lineage

What to Expect

Dataedo automatically imports:

  1. Packages,
  2. Tasks (as processes in Dataedo),
  3. Object-level lineage for supported task types,
  4. Column-level lineage for supported Data Flow transformations.

Prerequisites

To create SSIS lineage automatically in Dataedo:

  • Import documentation for all source and destination systems used by the package.
  • Define source and destination connections at the package level.
  • Ensure Linked Sources are matched correctly.

Data Transfer Tasks

If prerequisites are met, Dataedo creates lineage for:

  • Data Flow Task (object-level lineage for sources and destinations, and column-level lineage for supported transformations, including supported transformation-related sources such as Lookup)
  • Bulk Insert Task (destination)
  • FTP Task (file transfer flow)

For supported Data Flow transformations, at the Process level in Dataedo, full column-level lineage is created for:

  • Balanced Data Distributor
  • Conditional Split
  • Lookup
  • Multicast
  • Cache Transform
  • Export Column
  • Percentage Sampling
  • Aggregate
  • Sort
  • Union
  • Pivot
  • Row Sampling
  • Unpivot
  • Merge
  • Merge Join
  • Copy Column
  • Derived Column (including newly created columns, e.g., combining multiple input columns into one output column)

For the following transformations, column lineage is created for mapped columns, but not for newly created output columns:

  • Data Conversion
  • Script Component
  • Character Map
  • Fuzzy Grouping

The source or destination may be relational databases, files, or ODBC sources.

SSIS data flow lineage overview in Dataedo

In this view:

  • Source and Destination objects are displayed on both sides of the package-level process node.
  • The central node (Package) represents the Data Flow process between source and destination elements.
  • Column links can be expanded to see how individual fields are mapped.
  • The legend distinguishes automatic lineage from manual lineage.

How is SSIS lineage shown in Dataedo?

Dataedo's documentation of a sample SSIS data flow is shown below.

SSIS package-level lineage with sources, destinations, and process node

At package level, Dataedo groups SSIS tasks into a single process view and shows incoming and outgoing objects around that process.

Source

For database sources, Dataedo recognizes the following Data Access Mode options:

  • Table or view

  • Table name or view name from variable

  • SQL Command

  • SQL Command from variable

For each of these options, Dataedo constructs a lineage diagram that shows how data is retrieved.

SSIS Lookup elements are also added as Source objects with the Lookup subtype, and their lineage is shown in Dataedo.

SSIS Lookup shown as Source with Lookup subtype in lineage

The screenshot shows Package.Data Flow Task.Lookup in the Sources tree and the corresponding lineage node with mapped output columns.

Lineage is also constructed for file-based sources, as shown in the screenshot below.

Lineage for file-based SSIS source

In the file-based example, Dataedo links the documented file object (for example NewCustomers.txt) to the SSIS Source node and preserves column-level mappings.

For sources that use SQL queries to create them, there will be a lineage based on the parser

Lineage created from SQL query-based SSIS source

In the SQL-based example, the Column flow details panel contains the parsed query text (including aliases and joins), and upstream table nodes are linked to source columns based on parser results.

SSIS OLE DB Source script used for parser-based lineage

Process in lineage

SSIS process node in lineage diagram

The process node in Dataedo represents what is between Source and Destination in SSIS Data Flow (for example additional transformation steps). In package-level lineage, these steps are collapsed into one process object for readability.

For Data Flow transformations that use Derived Column, Dataedo now shows lineage for newly created columns (for example, when two input columns are combined into one output column).

SSIS Derived Column showing lineage for newly created output column

In the example, two input columns (Addressline1, addressline2) feed a newly created output column (Derived Column 1), and this dependency is visible directly on the column flow.

Destination

SSIS destination lineage in Dataedo

Dataedo links the SSIS Destination object to the actual target object (database table or file), based on the matched connection from the package.

End-to-end SSIS lineage from source objects through process to destinations.

FTP Tasks

SSIS FTP task lineage

For FTP tasks, Dataedo reads task properties (for example remote path and local path) and creates lineage between documented file objects when both endpoints are available in the repository.

Known Limitations

  • Column-level lineage is available for the Data Flow Task transformation types listed above.
  • Lineage based on third-party vendor elements is not fully validated.
  • Lineage is not created when variables are used as addresses.
  • Execute SQL Task lineage is not available

Troubleshooting

Lineage is created based on a matched Linked Source. Check the Dataedo desktop to make sure it is matched properly.

Linked Source mapping used by SSIS lineage in Dataedo

When validating Linked Source mapping, verify:

  • Source database points to the documented repository object (for example AdventureWorks2019@localhost),
  • SQL Dialect matches the source technology,
  • connection metadata (host/server, database/catalog, provider) is populated correctly.
Dataedo is an end-to-end data governance solution for mid-sized organizations.
Data Lineage • Data Quality • Data Catalog