🗃️Set validation rules
Add some data validation steps to detect errors in your datasources sooner.
When a new data file is dropped through the studio or extracted during preprocess, it can be validated. You can check the data types of the columns, the number of rows of a data set, the presence of some required columns, that there is no duplicated rows, etc.
The list of validation rules for data files are defined inside the etl_config.cson file under the validation key.
Example
Rows
Check the number of expected rows.
Keys:
type: ‘rows’
expected: (number) number of expected rows
Columns
Ensure that a given list of columns is a subset of the dataset’s columns
Keys:
type: ‘columns’
expected: (list(str)) columns you expected to find
Unique values
Ensure that the list of unique values of a given column corresponds exactly to a list of expected values.
Keys:
type: ‘unique_values’
expected: (list) unique values
params:
column: (string) column name
No duplicates
Duplicated rows can be assessed based on all the columns or only a subset of columns.
Keys:
type: ‘no_duplicates’
params:
columns: (list or string) list of columns to use or ‘all’
Value
Check the value of a column (one value only). If the query returned more than one row, only the first one will be used.
Keys:
type: ‘value’
expected: (string or number) expected value
params:
column: (string) in which to check the value
Data type
Check column data types. Three possible types: number, string, date, or category.
Keys:
type: ‘data_type’
expected: : <’string’, ‘number’, ‘date’, or ‘category’>
Pattern
Check if string values correspond to a defined pattern
Keys:
type: ‘pattern’
expected: pattern/regex as a string
params: object with a columns key: the list of columns to check.
Not null
Check if some columns don’t have null value.
Keys:
type: ‘not_null’
params: object with a columns key: list of columns.
Tutorial : Product Corporation
You can download the CSV file for our tutorial.
Add validation for your datasource in etl_config.cson
Drag and drop your new etl_config.cson in the
CONFIG FILES
pageGo to your ‘DATA SOURCES’ page and drop your datasource.
Validation should be ok. If not, the file is not uploaded.
Last updated
Was this helpful?