🗃️Set validation rules
Add some data validation steps to detect errors in your datasources sooner.
When a new data file is dropped through the studio or extracted during preprocess, it can be validated. You can check the data types of the columns, the number of rows of a data set, the presence of some required columns, that there is no duplicated rows, etc.
The list of validation rules for data files are defined inside the etl_config.cson file under the validation key.
Example
Rows
Check the number of expected rows.
Keys:
type: ‘rows’
expected: (number) number of expected rows
Columns
Ensure that a given list of columns is a subset of the dataset’s columns
Keys:
type: ‘columns’
expected: (list(str)) columns you expected to find
Unique values
Ensure that the list of unique values of a given column corresponds exactly to a list of expected values.
Keys:
type: ‘unique_values’
expected: (list) unique values
params:
column: (string) column name
No duplicates
Duplicated rows can be assessed based on all the columns or only a subset of columns.
Keys:
type: ‘no_duplicates’
params:
columns: (list or string) list of columns to use or ‘all’
Value
Check the value of a column (one value only). If the query returned more than one row, only the first one will be used.
Keys:
type: ‘value’
expected: (string or number) expected value
params:
column: (string) in which to check the value
Data type
Check column data types. Three possible types: number, string, date, or category.
Keys:
type: ‘data_type’
expected: : <’string’, ‘number’, ‘date’, or ‘category’>
Pattern
Check if string values correspond to a defined pattern
Keys:
type: ‘pattern’
expected: pattern/regex as a string
params: object with a columns key: the list of columns to check.
Not null
Check if some columns don’t have null value.
Keys:
type: ‘not_null’
params: object with a columns key: list of columns.
Tutorial : Product Corporation
You can download the CSV file for our tutorial.
Add validation for your datasource in etl_config.cson
Drag and drop your new etl_config.cson in the
CONFIG FILES
pageGo to your ‘DATA SOURCES’ page and drop your datasource.
Validation should be ok. If not, the file is not uploaded.
Last updated