Creating rule instances

The Dataedo data quality feature offers more than 80 predefined rules that can be applied to columns in tables from supported connectors, ensuring your data meets specific quality standards. The key components of Dataedo data quality feature are Rules and Instances:

Rule: A rule defines the criteria for what constitutes "good" or “acceptable” data. For example, the "allowed values" rule checks whether column values match a predefined list you provide. These rules serve as essential guidelines for effective data validation.
Instance: An instance refers to the application of a rule to a specific column in your dataset. It represents the evaluation of data based on the parameters you define for that column.

In the Dataedo portal, you can select a rule, assign it to a column, and view the results for the corresponding instance. A single column can have multiple rules, but each rule must be assigned individually, as each may require different parameters. Additionally, you can edit or remove rules independently, providing flexibility in how they are managed.

Creating instance entry points

You can create an instance in Dataedo portal using three different methods:

Method 1: From a column

Search for the column you want to assign rules to.
Navigate to its Data Quality tab.
Click the Create rule instance button.
A window will open where you can select an available rule to apply to the column.

Method 2: From a table

Find the table you're interested in.
Navigate to its Data Quality tab.
Click the Create rule instance button.
Select a column from the table, then choose the rule you want to apply.

From the Data Quality tab

Navigate to the Data Quality tab in the main menu.
Go to the Rule Instances tab.
Click the Create rule instance button.
A popup will guide you through selecting a data source, table, and column, followed by choosing a rule.

Once you've chosen the method to create your rule instance, you’ll move on to assigning the rule.

Assigning a rule

If you want to create a custom SQL rule, check out the Custom SQL rules guide.

Step 1: Select a rule

After selecting a column, the first step is to choose a rule. Each rule has the following attributes:

Name and description: Explains what the rule checks.
Library: The rule belongs to a specific library. In the future, you'll be able to create your custom rule library.
Applicable column types:
- All: Can be assigned to any column type.
- Text: For string-type columns only.
- Date: For date-type columns only.

info

If there are any warnings (for example, rule compatibility issues with an older connector version or missing native support), you’ll see them here.

Step 2: Parameters and filters

Some rules require additional parameters to function, while others don't. For example:

Not null: No extra parameters are required; it simply checks if the selected column contains any null values.
Allowed values: Requires a list of valid values. The rule checks if the column data matches this list.
Value range: Requires defining a minimum and maximum value. Any data outside this range will be flagged.

After entering the required parameters, an optional Filter field will appear. This parameter allows you to limit which records are checked in the instance. For example:

You might skip checking email validity for records created before 2015, when your company started email validation.
You might want to only check invoices marked as high priority.

To apply a filter, you can create a query specific to the records you want to include. Filter syntax may vary depending on your connector. For example:

For SQL Server: Use [priority] = 1.
For MySQL: Use priority = 1.

info

Make sure to use the correct syntax supported by your connector.

The Parameters and filters step also lets you preview the queries used to check the column's data. On the right-hand side, a toggle will let you view two types of queries:

Raw rule query: Displays the rule's definition with parameter placeholders.
Instance rule query: Shows the query with the table, column names, and applied parameters and filters.

Step 3: Failed Rows

By default, Dataedo collects only numeric statistics about your data quality (e.g., how many rows were tested and how many passed or failed). In this step, you can choose to save the failed rows for easier identification and resolution. If you activate this option, Dataedo saves the first 1,000 failed rows. The ID column(s) are required to uniquely identify each row. This field is prefilled with the Primary Key of the selected table. If there's no Primary Key, you'll need to specify one or more columns as unique identifiers.

The second field in this step is Additional columns, where you can list other columns to help identify the failed rows. You can also sort the results by one or more specific columns.

Step 4: Settings

The final step is to set up the instance. Here, you can choose the state of the instance:

Active: The rule will run during every scheduled Data Quality check.
Draft: The rule will be created but will only run once it's set to active. You can change the state at any time by editing the instance.

You can also define the severity of the instance, which helps prioritize rule runs. For example, you might schedule critical rules to run daily, while lower-severity rules could run weekly. Finally, the Instance description field lets you note important details, such as when a filter is applied. This helps business users understand that the rule only checks a specific subset of data.