Amazon Athena
Dataedo introduces a native connector to Amazon Athena. Amazon Athena is a query engine on Amazon Web Services. It is unique in a way that it provides access to AWS Glue Data Catalog, which holds metadata from various AWS services.
Connector features
Data catalog
Dataedo documents metadata of the following objects:
- Tables
- Views
Using the Athena connector, you can extract metadata from:

Complex fields
Amazon Athena supports embedded complex columns, allowing you to reference and query individual elements within a column (such as fields inside structs or elements of arrays). To improve readability and usability, Dataedo represents complex columns as a hierarchical structure in the column grid, making it easy to explore and understand nested data. Each element of a complex column can be:
- Individually profiled
- Used independently in Data Quality rules
This allows you to analyze and validate specific parts of complex data structures with full precision.

Data lineage
Dataedo supports both manual and automatic lineage for Amazon Athena. Automatic lineage is currently available for views and external tables.
- For views, Dataedo parses the definition and displays the source of each column.
- For external tables, Dataedo searches for files with matching locations to establish lineage.

Data Profiling
Users will be able to run data profiling for a table, view, or materialized view, then save selected data in the repository. This data will be available from Desktop and Web. Profiling requires SELECT permission over the profiled object.
Data Quality
Users will be able to check if data in Athena tables is accurate, consistent, complete, and reliable using Data Quality functionality.
Data Quality requires SELECT permission over the tested object.

How to connect
- Access Key Access Key authentication is the default authentication method
- Access Key - IAM user access key ID
- Secret Key - IAM user secret key
-
IAM Role Authentication uses the AWS role assigned to the environment in which Dataedo is running. The assigned role must have permissions to access Athena and related AWS services. There are no additional fields for this auth method.
-
STS Authentication allows Dataedo to assume an IAM role using temporary credentials
- Access Key (optional)- IAM user access key ID
- Secret Key (optional)- IAM user secret key
- Role Arn - Amazon Resource Name (ARN) of the IAM role to assume
- External Id (optional) - identifier used for enhanced security when assuming cross-account roles
All of above authentication methods have 4 additional fields:
- AWS Region - AWS region in which Athena resides
- Data Catalog - Data catalog (Athena data source) you want to connect to
- Workgroup - Athena workgroup
- Database - Athena database

To configure AWS services check detailed connection instructions.
Connector specification
Supported versions
Dataedo Athena connector was tested with Athena Engine version 2. Athena Engine version 1 is not officially supported.
Tested data catalogs
- S3 - AWS Glue Data Catalog
- Amazon DynamoDB
- Amazon DocumentDB
- Amazon CloudWatch
- Amazon CloudWatch Metrics
Imported objects
| Object | Imported as |
|---|---|
| Table | Table |
| View | View |
Tables metadata
| Metadata | Imported as |
|---|---|
| Name | Name |
| Columns | Name |
| Name | Name |
| Data type | Data type |
| Position | Position |
| Nullable | Nullable |
| Description | Description |
| Default value | Default value |
| Location | Location (only for external tables) |
| Definition | Script (only for external tables) |
Views metadata
| Metadata | Imported as |
|---|---|
| Name | Name |
| Script | Script |
| Columns | Name |
| Name | Name |
| Data type | Data type |
| Position | Position |
| Nullable | Nullable |
Data profiling
Dataedo supports the following data profiling in Athena:
| Profile | Support |
|---|---|
| Table row count | ✅ |
| Table sample data | ✅ |
| Column distribution (unique, non-unique, null, empty values) | ✅ |
| Min, max values | ✅ |
| Average | ✅ |
| Variance | ✅ |
| Standard deviation | ✅ |
| Min-max span | ✅ |
| Number of distinct values | ✅ |
| Top 10/100/1000 values | ✅ |
| 10 random values | ✅ |
Data Lineage
| Source | Method | Status |
|---|---|---|
| Views - object level | From SQL parsing | ✅ |
| Views - column level | From SQL parsing | ✅ |
| External tables - object level | From S3 Localization | ✅ |
Required access level
Dataedo import requires certain IAM policies to make full import:
Athena
- GetDatabase
- GetQueryExecution
- GetQueryExecutions
- GetQueryResults
- GetTable
- GetTableMetadata
- GetWorkGroup
- ListEngineVersions
- ListTableMetadata
- StartQueryExecution
- ListDatabases
- ListDataCatalogs
- ListWorkGroups
Glue
- GetDatabases
- GetPartitions
- GetTable
- GetTables
S3
- GetObject
- PutObject
- ListBucket