Databricks Unity Catalog support

Databricks is a data processing cloud-based platform. It simplifies collaboration of data analysts, data engineers, and data scientists. Databricks is available in Microsoft Azure, Amazon Web Services, and Google Cloud Platform.

Dataedo will connect to a single catalog Unity Catalog via API, and document objects and data lineage within the connected catalog.

Instructions on how to connect to Databricks using Dataedo can be found at: Connecting to Databricks Unity Catalog

Connector features

Data Source	Support	Schema	Lineage	Profiling	Classification	Export comments	FK tester	DDL import
Databricks Unity Catalog	Native	✅	Column Level	❌	✅	✅	NA	NA

Data Catalog

Dataedo will document the following objects and their respective properties from Databricks:

Object Name	Metadata	Lineage
Delta Live Tables	✅	✅
Pipelines	Limited	✅
Tables	✅	✅
Views	✅	✅
Columns	✅	✅
External locations	✅	✅
External Tables	✅	✅
Primary keys	✅
Foreign keys	✅

Objects Properties Configuration & Support

Documentation is created for one selected catalog from Databricks Unity Catalog.

Known Limitations

Documentation Functionality

Data Profiling is not available for Databricks, however, we're working on this feature for future releases.
Connection to multiple catalogs at once or regional metastore is not yet supported [it is on the roadmap].
For pipelines, Dataedo will discover only the name, not the script.

Lineage Functionality

Column level lineage for external tables will be created only if the data source (for example JSON file) schema is automatically discovered by Databricks and column names are not changed.

Connector features​

Data Catalog​

Known Limitations​

Documentation Functionality

Lineage Functionality

Connector features

Data Catalog

Known Limitations