Databricks Unity Catalog support
Databricks is a data processing cloud-based platform. It simplifies collaboration of data analysts, data engineers, and data scientists. Databricks is available in Microsoft Azure, Amazon Web Services, and Google Cloud Platform.
Dataedo will connect to a single catalog Unity Catalog via API, and document objects and data lineage within the connected catalog.
Instructions on how to connect to Databricks using Dataedo can be found at: Connecting to Databricks Unity Catalog
Connector features
Data Source | Support | Schema | Lineage | Profiling | Classification | Export comments | FK tester | DDL import |
---|---|---|---|---|---|---|---|---|
Databricks Unity Catalog | Native | ✅ | Column Level | ❌ | ✅ | ✅ | NA | NA |
Data Catalog
Dataedo will document the following objects and their respective properties from Databricks:
Object Name | Metadata | Lineage |
---|---|---|
Delta Live Tables | ✅ | ✅ |
Pipelines | Limited | ✅ |
Tables | ✅ | ✅ |
Views | ✅ | ✅ |
Columns | ✅ | ✅ |
External locations | ✅ | ✅ |
External Tables | ✅ | ✅ |
Primary keys | ✅ | |
Foreign keys | ✅ |
Objects Properties Configuration & Support
Documentation is created for one selected catalog from Databricks Unity Catalog.
Known Limitations
Documentation Functionality
- Data Profiling is not available for Databricks, however, we're working on this feature for future releases.
- Connection to multiple catalogs at once or regional metastore is not yet supported [it is on the roadmap].
- For pipelines, Dataedo will discover only the name, not the script.
Lineage Functionality
- Column level lineage for external tables will be created only if the data source (for example JSON file) schema is automatically discovered by Databricks and column names are not changed.