Skip to main content

Azure Data Lake Storage

Azure Data Lake Storage is a service designed for storing and analysis of Big Data files. One of the key features distinguishing it from Azure Blob Storage are hierarchical namespaces, which organize objects/files into a hierarchy of directories for efficient data access.

Dataedo provides a native Azure Data Lake Storage connector, which allows you to document objects/files stored in this service.

Authentication

In order to document Azure Data Lake Storage, you will need to authenticate. Dataedo supports the following options of authentication:

  • Access Key
  • Azure Active Directory - Interactive
  • Connection string
  • Public container (no authentication)
  • Shared Access Signature - Account
  • Shared Access Signature - Directory
  • Shared Access Signature - URL

Out of all options, we recommend Azure Active Directory - Interactive authentication, as it provides the best access control in the Azure cloud.

How to find storage account name in Azure

Storage account name is a unique identifier of your account within the whole Azure cloud. You can find it in the following way (we'll show only the Azure Portal method, as others are more advanced):

  1. Sign into Azure Portal.
  2. Search for storage account and select Storage accounts:
storage_account_search
  1. You will see a list of Storage Accounts, where the name is the first column. Copy the name of an account you want to document:
storage_accounts_list

Access key

When you create an Azure Storage Account, it receives automatically generated keys that can be used to authenticate to Data Lake Storage in Dataedo.

How to find Access Key in Azure Portal

  1. Search for Storage Accounts, and open the one you would like to document.
  2. On the side tab, look for Access keys and open the page:
access_keys
  1. Click Show button to reveal Key and copy it to Dataedo. You can use either of the keys. Furthermore, you can copy Storage account name on the page:
access_keys_copy
  1. Paste values in Dataedo connector window:
dataedo_azure_storage_access_keys

Azure Active Directory - Interactive

You can use your Azure AD credentials to access Storage Account. All you need to provide is Account Name. Finding one is described in How to find storage account name in Azure section of this article. Once you click Connect in Dataedo window you will have to interactively sign in with your Azure account.

azure_ad_interactive

Azure Active Directory - Interactive authentication cannot be used to automate imports with dataedocmd, as it needs you to manually sign in before import can begin

Connection string

Connection string for Azure Data Storage contains all required information to connect to Storage Account, hence there is only one field in Dataedo connector for this type of authentication.

How to find Connection string in Azure Portal

  1. Search for Storage Accounts, and open the one you would like to document.
  2. On the side tab, look for Access keys and open the page:
access_keys
  1. Click Show button to reveal Connection string and copy it to Dataedo. You can use either of the connection strings. Furthermore, you can copy Storage account name on the page:
account_connection_string
  1. Paste connection string in Dataedo connector window:
azure_storage_conn_string_dataedo

Public container

Some of the containers can be accessed publicly, without need of any authentication. In such case, the Dataedo will require you to provide storage account name and container name. Obtaining the first one is described in How to find storage account name in Azure section. Public container name can be found in the following way:

  1. Search for Storage Accounts, and open the one you would like to document.
  2. On the side tab, look for Containers and open that page:
containers_azure_portal
  1. On the Containers page you will find a list of containers available in Storage Account. Make sure the selected container has Container access level.
containers_list_azure_portal

Connecting to Azure Data Lake Storage

Connect

Once you filled required connection details, hit Connect button.

dataedo_connect

Once connected set the documentation title.

Select format (optional)

On the next screen you will need to select the format of files that are going to be documented. This step is optional, as if you will select files in different formats, Dataedo will automatically detect the format and use the appropriate connector.

format_selection

Select files

On the next screen select files that are to be documented. As stated earlier, files can be in various formats. In the next step, Dataedo will try to document files based on their extensions.

files_select

Once you selected files, hit Import button

Choose Objects to import

During this step you can check/uncheck objects and their columns/fields for import. To verify columns/fields, select a file from the list. Furthermore, you can change the format which Dataedo recognized, by expanding the list in Type column.

choose_objects

Import

Once you're sure everything is correct hit Import button to import selected files.

Outcome

Selected files metadata was loaded into a new documentation.

data_lake_storage_outcome