Databricks Unity Catalog support
Starting from Dataedo 25.3, to connect to Databricks it is required to provide SQL Warehouse name that will allow to execute SQL queries via Databricks API. The compute resources of this warehouse will be used in Data Profiling and Data Quality modules and to retrieve data lineage faster using system tables. To import column lineage, the following privileges are now required: USE SCHEMA
on system.access
schema and SELECT
on system.access.column_lineage
table.
Databricks is a data processing cloud-based platform. It simplifies collaboration of data analysts, data engineers, and data scientists. Databricks is available in Microsoft Azure, Amazon Web Services, and Google Cloud Platform.
Dataedo will connect to a single catalog Unity Catalog via API, and document objects and data lineage within the connected catalog.
Instructions on how to connect to Databricks using Dataedo can be found at: Connecting to Databricks Unity Catalog
Connector features
Data Source | Support | Schema | Lineage | Profiling | Classification | Export comments | FK tester | DDL import |
---|---|---|---|---|---|---|---|---|
Databricks Unity Catalog | Native | ✅ | ✅ | ✅ | ✅ | ❌ | NA | NA |
Read more about automatic data lineage in a Databricks automatic data lineage documentation.
Read more about profiling in a Data Profiling documentation
Data Catalog
Dataedo will document the following objects and their respective properties from Databricks:
Object Name | Metadata | Lineage |
---|---|---|
Delta Live Tables | ✅ | ✅ |
Pipelines | Limited | ✅ |
Tables | ✅ | ✅ |
Views | ✅ | ✅ |
Columns | ✅ | ✅ |
External locations | ✅ | ✅ |
External Tables | ✅ | ✅ |
Primary keys | ✅ | |
Foreign keys | ✅ |
Documentation is created for one selected catalog from Databricks Unity Catalog.
Known Limitations
Documentation Functionality
- Connection to multiple catalogs at once or regional metastore is not yet supported [it is on the roadmap].
- For pipelines, Dataedo will discover only the name, not the script.
Lineage Functionality
- Column level lineage for external tables will be created only if the data source (for example JSON file) schema is automatically discovered by Databricks and column names are not changed.