Skip to main content

Databricks Unity Catalog support

caution

Starting from Dataedo 25.3, to connect to Databricks it is required to provide SQL Warehouse name that will allow to execute SQL queries via Databricks API. The compute resources of this warehouse will be used in Data Profiling and Data Quality modules and to retrieve data lineage faster using system tables. To import column lineage, the following privileges are now required: USE SCHEMA on system.access schema and SELECT on system.access.column_lineage table.

Databricks is a data processing cloud-based platform. It simplifies collaboration of data analysts, data engineers, and data scientists. Databricks is available in Microsoft Azure, Amazon Web Services, and Google Cloud Platform.

Dataedo will connect to a single catalog Unity Catalog via API, and document objects and data lineage within the connected catalog.

Instructions on how to connect to Databricks using Dataedo can be found at: Connecting to Databricks Unity Catalog

Connector features

Data SourceSupportSchemaLineageProfilingClassificationExport commentsFK testerDDL import
Databricks Unity CatalogNativeNANA

Read more about automatic data lineage in a Databricks automatic data lineage documentation.

Read more about profiling in a Data Profiling documentation

Data Catalog

Dataedo will document the following objects and their respective properties from Databricks:

Object NameMetadataLineage
Delta Live Tables
PipelinesLimited
Tables
Views
Columns
External locations
External Tables
Primary keys
Foreign keys

Documentation is created for one selected catalog from Databricks Unity Catalog.

Known Limitations

Dataedo is an end-to-end data governance solution for mid-sized organizations.
Data Lineage • Data Quality • Data Catalog