Reusable Rules: Parameters and Functions

info

This article is related to Reusable Data Quality Rules.
If you want to learn what reusable rules are and how to create them, check that article first.

Reusable Data Quality rules support a set of built-in parameters and functions that make them reusable across different connectors. These are automatically translated into the correct SQL syntax depending on the connected source.

You can also use source-specific functions, but doing so will limit the portability of your reusable rule across technologies.

Built-in parameters

These parameters can be used in the definition and will be translated to the selected data source type (for example, SQL Server or MySQL).

Parameter	Example usage in definition	Translates to example (in an instance)
`table_name`	`SELECT * FROM {{ table_name }}`	`SELECT * FROM [dbo].[Customers]`
`column_name`	`SELECT * FROM {{ table_name }} WHERE {{ column_name }} = 'test'`	`SELECT * FROM [dbo].[Customers] WHERE Email = 'test'`
`row_filter`	`SELECT * FROM {{ table_name }} WHERE {{ row_filter }}`	`SELECT * FROM [dbo].[Customers] WHERE Country = 'USA'`
`null_values_filter`	`SELECT * FROM {{ table_name }} WHERE {{ null_values_filter }} {{ row_filter }}`	`SELECT * FROM [dbo].[Customers] WHERE Email IS NOT NULL AND Country = 'USA'`

About the `row_filter` Parameter

The row_filter parameter is automatically added at the very end of your SQL query when you define a reusable rule.
This allows users, when creating a rule instance, to add an additional filter - for example, priority = 1 - so that the rule doesn’t check all rows but only the subset that matters.
There are cases where you might want to use row_filter somewhere else in the query. For example, if you are counting rows, placing it at the end could filter out rows after the calculation instead of filtering the rows to be counted.

Example:

SELECT *
FROM {{ table_name }}
WHERE
  (SELECT COUNT(*)
   FROM {{ table_name }}
   WHERE {{ row_filter }}) > {{ max }}

In this query, row_filter is applied inside the subquery, so the rule counts only the filtered rows before comparing with max.

Built-in functions

Reusable rules also support a set of cross-database functions prefixed with dataedo.. These functions are automatically translated to the correct SQL syntax for the target database.

Function	SQL Server	MySQL	MariaDB	Oracle	Snowflake	PostgreSQL
`dataedo.now()`	`getdate()`	`now()`	`now()`	`SYSDATE`	`CURRENT_TIMESTAMP`	`CURRENT_TIMESTAMP`
`dataedo.current_user()`	`CURRENT_USER`	`CURRENT_USER()`	`CURRENT_USER()`	`USER`	`CURRENT_USER()`	`CURRENT_USER`
`dataedo.upper(column)`	`UPPER(column)`	`UPPER(column)`	`UPPER(column)`	`UPPER(column)`	`UPPER(column)`	`UPPER(column)`
`dataedo.lower(column)`	`LOWER(column)`	`LOWER(column)`	`LOWER(column)`	`LOWER(column)`	`LOWER(column)`	`LOWER(column)`
`dataedo.trim(column)`	`LTRIM(RTRIM(column))`	`TRIM(column)`	`TRIM(column)`	`TRIM(column)`	`TRIM(column)`	`TRIM(column)`
`dataedo.length(column)`	`LEN(column)`	`CHAR_LENGTH(column)`	`CHAR_LENGTH(column)`	`LENGTH(column)`	`LEN(column)`	`LENGTH(column)`
`dataedo.substring(column, start, length)`	`SUBSTRING(column, start, length)`	`SUBSTRING(column, start, length)`	`SUBSTRING(column, start, length)`	`SUBSTR(column, start, length)`	`SUBSTRING(column, start, length)`	`SUBSTR(column, start, length)`
`dataedo.date_add(date, part, value)`	`DATEADD(part, value, date)`	`DATE_ADD(date, INTERVAL value part)`	`DATE_ADD(date, INTERVAL value part)`	`date + INTERVAL 'value' part`	`DATEADD(part, value, date)`	`date + INTERVAL 'value part'`
`dataedo.empty_string(column)`	`NULLIF(LTRIM(RTRIM(column)), '') IS NULL`	`NULLIF(TRIM(column), '') IS NULL`	`NULLIF(TRIM(column), '') IS NULL`	`NULLIF(TRIM(column), '') IS NULL`	`NULLIF(TRIM(column), '') IS NULL`	`NULLIF(TRIM(column), '') IS NULL`
`dataedo.valid_json(column)`	`ISJSON(column)`	`JSON_VALID(column)`	`JSON_VALID(column)`	`CASE WHEN column IS JSON THEN 1 ELSE 0 END`	`CASE COALESCE(CHECK_JSON(column), 'OK') WHEN 'OK' THEN 1 ELSE 0 END`	`CASE WHEN column IS JSON THEN 1 ELSE 0 END`

info

You can also use source-specific functions, but those will limit reusability of the rule across different connectors.

Functions usage examples

Here are some examples of how functions are used in SQL. The syntax is automatically adjusted to each supported database.

String functions

-- Input
dataedo.upper([Name]) 
dataedo.lower([Email]) 
dataedo.substring([Description], 1, 10)

-- SQL Server
UPPER([Name])
LOWER([Email])
SUBSTRING([Description], 1, 10)

Date functions

-- SQL Server
dataedo.date_add([CreatedDate], 'day', 7) 
→ DATEADD(day, 7, [CreatedDate])

-- MySQL / MariaDB
dataedo.date_add([CreatedDate], 'day', 7) 
→ DATE_ADD([CreatedDate], INTERVAL 7 DAY)

-- PostgreSQL
dataedo.date_add([CreatedDate], 'day', 7) 
→ [CreatedDate] + INTERVAL '7 day'

JSON validation

-- SQL Server
dataedo.valid_json([JsonColumn]) 
→ ISJSON([JsonColumn])

-- MySQL / MariaDB
dataedo.valid_json([JsonColumn]) 
→ JSON_VALID([JsonColumn])

-- PostgreSQL
dataedo.valid_json([JsonColumn]) 
→ CASE WHEN [JsonColumn] IS JSON THEN 1 ELSE 0 END

Built-in parameters​

About the row_filter Parameter​

Built-in functions​

Functions usage examples​

String functions​

Date functions​

JSON validation​