Reusable Rules: Parameters and Functions
This article is related to Reusable Data Quality Rules.
If you want to learn what reusable rules are and how to create them, check that article first.
Reusable Data Quality rules support a set of built-in parameters and functions that make them reusable across different connectors. These are automatically translated into the correct SQL syntax depending on the connected source.
You can also use source-specific functions, but doing so will limit the portability of your reusable rule across technologies.
Built-in parameters
These parameters can be used in the definition and will be translated to the selected data source type (for example, SQL Server or MySQL).
Parameter | Example usage in definition | Translates to example (in an instance) |
---|---|---|
table_name | SELECT * FROM {{ table_name }} | SELECT * FROM [dbo].[Customers] |
column_name | SELECT * FROM {{ table_name }} WHERE {{ column_name }} = 'test' | SELECT * FROM [dbo].[Customers] WHERE Email = 'test' |
row_filter | SELECT * FROM {{ table_name }} WHERE {{ row_filter }} | SELECT * FROM [dbo].[Customers] WHERE Country = 'USA' |
null_values_filter | SELECT * FROM {{ table_name }} WHERE {{ null_values_filter }} {{ row_filter }} | SELECT * FROM [dbo].[Customers] WHERE Email IS NOT NULL AND Country = 'USA' |
About the row_filter
Parameter
The row_filter
parameter is automatically added at the very end of your SQL query when you define a reusable rule.
This allows users, when creating a rule instance, to add an additional filter - for example, priority = 1
- so that the rule doesn’t check all rows but only the subset that matters.
There are cases where you might want to use row_filter
somewhere else in the query. For example, if you are counting rows, placing it at the end could filter out rows after the calculation instead of filtering the rows to be counted.
Example:
SELECT *
FROM {{ table_name }}
WHERE
(SELECT COUNT(*)
FROM {{ table_name }}
WHERE {{ row_filter }}) > {{ max }}
In this query, row_filter
is applied inside the subquery, so the rule counts only the filtered rows before comparing with max
.
Built-in functions
Reusable rules also support a set of cross-database functions prefixed with dataedo.
. These functions are automatically translated to the correct SQL syntax for the target database.
Function | SQL Server | MySQL | MariaDB | Oracle | Snowflake | PostgreSQL |
---|---|---|---|---|---|---|
dataedo.now() | getdate() | now() | now() | SYSDATE | CURRENT_TIMESTAMP | CURRENT_TIMESTAMP |
dataedo.current_user() | CURRENT_USER | CURRENT_USER() | CURRENT_USER() | USER | CURRENT_USER() | CURRENT_USER |
dataedo.upper(column) | UPPER(column) | UPPER(column) | UPPER(column) | UPPER(column) | UPPER(column) | UPPER(column) |
dataedo.lower(column) | LOWER(column) | LOWER(column) | LOWER(column) | LOWER(column) | LOWER(column) | LOWER(column) |
dataedo.trim(column) | LTRIM(RTRIM(column)) | TRIM(column) | TRIM(column) | TRIM(column) | TRIM(column) | TRIM(column) |
dataedo.length(column) | LEN(column) | CHAR_LENGTH(column) | CHAR_LENGTH(column) | LENGTH(column) | LEN(column) | LENGTH(column) |
dataedo.substring(column, start, length) | SUBSTRING(column, start, length) | SUBSTRING(column, start, length) | SUBSTRING(column, start, length) | SUBSTR(column, start, length) | SUBSTRING(column, start, length) | SUBSTR(column, start, length) |
dataedo.date_add(date, part, value) | DATEADD(part, value, date) | DATE_ADD(date, INTERVAL value part) | DATE_ADD(date, INTERVAL value part) | date + INTERVAL 'value' part | DATEADD(part, value, date) | date + INTERVAL 'value part' |
dataedo.empty_string(column) | NULLIF(LTRIM(RTRIM(column)), '') IS NULL | NULLIF(TRIM(column), '') IS NULL | NULLIF(TRIM(column), '') IS NULL | NULLIF(TRIM(column), '') IS NULL | NULLIF(TRIM(column), '') IS NULL | NULLIF(TRIM(column), '') IS NULL |
dataedo.valid_json(column) | ISJSON(column) | JSON_VALID(column) | JSON_VALID(column) | CASE WHEN column IS JSON THEN 1 ELSE 0 END | CASE COALESCE(CHECK_JSON(column), 'OK') WHEN 'OK' THEN 1 ELSE 0 END | CASE WHEN column IS JSON THEN 1 ELSE 0 END |
You can also use source-specific functions, but those will limit reusability of the rule across different connectors.
Functions usage examples
Here are some examples of how functions are used in SQL. The syntax is automatically adjusted to each supported database.
String functions
-- Input
dataedo.upper([Name])
dataedo.lower([Email])
dataedo.substring([Description], 1, 10)
-- SQL Server
UPPER([Name])
LOWER([Email])
SUBSTRING([Description], 1, 10)
Date functions
-- SQL Server
dataedo.date_add([CreatedDate], 'day', 7)
→ DATEADD(day, 7, [CreatedDate])
-- MySQL / MariaDB
dataedo.date_add([CreatedDate], 'day', 7)
→ DATE_ADD([CreatedDate], INTERVAL 7 DAY)
-- PostgreSQL
dataedo.date_add([CreatedDate], 'day', 7)
→ [CreatedDate] + INTERVAL '7 day'
JSON validation
-- SQL Server
dataedo.valid_json([JsonColumn])
→ ISJSON([JsonColumn])
-- MySQL / MariaDB
dataedo.valid_json([JsonColumn])
→ JSON_VALID([JsonColumn])
-- PostgreSQL
dataedo.valid_json([JsonColumn])
→ CASE WHEN [JsonColumn] IS JSON THEN 1 ELSE 0 END