Skip to main content

Amazon Athena

Dataedo introduces a native connector to Amazon Athena. Amazon Athena is a query engine on Amazon Web Services. It is unique in a way that it provides access to AWS Glue Data Catalog, which holds metadata from various AWS services.

Connector features

Data catalog

Dataedo documents metadata of the following objects:

  • Tables
  • Views

Using the Athena connector, you can extract metadata from:

Image title

Complex fields

Amazon Athena supports embedded complex columns, allowing you to reference and query individual elements within a column (such as fields inside structs or elements of arrays). To improve readability and usability, Dataedo represents complex columns as a hierarchical structure in the column grid, making it easy to explore and understand nested data. Each element of a complex column can be:

  • Individually profiled
  • Used independently in Data Quality rules

This allows you to analyze and validate specific parts of complex data structures with full precision.

Athena complex fields

Data lineage

Dataedo supports both manual and automatic lineage for Amazon Athena. Automatic lineage is currently available for views and external tables.

  • For views, Dataedo parses the definition and displays the source of each column.
  • For external tables, Dataedo searches for files with matching locations to establish lineage.
Athena lineage

Data Profiling

Users will be able to run data profiling for a table, view, or materialized view, then save selected data in the repository. This data will be available from Desktop and Web. Profiling requires SELECT permission over the profiled object.

Data Quality

Users will be able to check if data in Athena tables is accurate, consistent, complete, and reliable using Data Quality functionality. Data Quality requires SELECT permission over the tested object.

Athena Data Quality

How to connect

  1. Access Key Access Key authentication is the default authentication method
  • Access Key - IAM user access key ID
  • Secret Key - IAM user secret key
  1. IAM Role Authentication uses the AWS role assigned to the environment in which Dataedo is running. The assigned role must have permissions to access Athena and related AWS services. There are no additional fields for this auth method.

  2. STS Authentication allows Dataedo to assume an IAM role using temporary credentials

  • Access Key (optional)- IAM user access key ID
  • Secret Key (optional)- IAM user secret key
  • Role Arn - Amazon Resource Name (ARN) of the IAM role to assume
  • External Id (optional) - identifier used for enhanced security when assuming cross-account roles

All of above authentication methods have 4 additional fields:

  • AWS Region - AWS region in which Athena resides
  • Data Catalog - Data catalog (Athena data source) you want to connect to
  • Workgroup - Athena workgroup
  • Database - Athena database
connection details

To configure AWS services check detailed connection instructions.

Connector specification

Supported versions

Dataedo Athena connector was tested with Athena Engine version 2. Athena Engine version 1 is not officially supported.

Tested data catalogs

  • S3 - AWS Glue Data Catalog
  • Amazon DynamoDB
  • Amazon DocumentDB
  • Amazon CloudWatch
  • Amazon CloudWatch Metrics

Imported objects

ObjectImported as
TableTable
ViewView

Tables metadata

MetadataImported as
NameName
ColumnsName
   NameName
   Data typeData type
   PositionPosition
   NullableNullable
   DescriptionDescription
   Default valueDefault value
LocationLocation (only for external tables)
DefinitionScript (only for external tables)

Views metadata

MetadataImported as
NameName
ScriptScript
ColumnsName
   NameName
   Data typeData type
   PositionPosition
   NullableNullable

Data profiling

Dataedo supports the following data profiling in Athena:

ProfileSupport
Table row count
Table sample data
Column distribution (unique, non-unique, null, empty values)
Min, max values
Average
Variance
Standard deviation
Min-max span
Number of distinct values
Top 10/100/1000 values
10 random values

Data Lineage

SourceMethodStatus
Views - object levelFrom SQL parsing
Views - column levelFrom SQL parsing
External tables - object levelFrom S3 Localization

Required access level

Dataedo import requires certain IAM policies to make full import:

Athena

  • GetDatabase
  • GetQueryExecution
  • GetQueryExecutions
  • GetQueryResults
  • GetTable
  • GetTableMetadata
  • GetWorkGroup
  • ListEngineVersions
  • ListTableMetadata
  • StartQueryExecution
  • ListDatabases
  • ListDataCatalogs
  • ListWorkGroups

Glue

  • GetDatabases
  • GetPartitions
  • GetTable
  • GetTables

S3

  • GetObject
  • PutObject
  • ListBucket
Dataedo is an end-to-end data governance solution for mid-sized organizations.
Data Lineage • Data Quality • Data Catalog