# Set validation rules

Add some data validation steps to detect errors in your datasources sooner.

When a new data file is dropped through the studio or extracted during preprocess, it can be validated. You can check the data types of the columns, the number of rows of a data set, the presence of some required columns, that there is no duplicated rows, etc.

The list of validation rules for data files are defined inside the *etl\_config.cson* file under the *validation* key.

### Example

```
domain: "market-breakdown"
type: "excel"
file: "data-market-breakdown.xls"
sheetname: "Data"
validation:[
  type: "data_type"
  expected:
    "breakdown": "string"
    "pays": "string"
    "part": "number"
    "clients": "number"
,
  type: "pattern"
  expected: "[0-9]+"
  params:
    columns: ['metric']
]
```

### Rows

Check the number of expected rows.

Keys:

* type: ‘rows’
* expected: (number) number of expected rows

### Columns

Ensure that a given list of columns is a subset of the dataset’s columns

Keys:

* type: ‘columns’
* expected: (list(str)) columns you expected to find

```
type: 'columns'
expected: ['metric', 'target']
```

### Unique values

Ensure that the list of unique values of a given column corresponds exactly to a list of expected values.

Keys:

* type: ‘unique\_values’
* expected: (list) unique values
* params:
* column: (string) column name

```
type: 'unique_values'
expected: ["France", "Germany", "Spain"]
params:
    column: 'target'
```

### No duplicates

Duplicated rows can be assessed based on all the columns or only a subset of columns.

Keys:

* type: ‘no\_duplicates’
* params:
* columns: (list or string) list of columns to use or ‘all’

```
type: 'no_duplicates'
params:
    columns: ['metric']
```

### Value

Check the value of a column (one value only). If the query returned more than one row, only the first one will be used.

Keys:

* type: ‘value’
* expected: (string or number) expected value
* params:
* column: (string) in which to check the value

```
type: 'value'
expected: 30
params:
    column: 'target'
```

### Data type

Check column data types. Three possible types: *number*, *string*, *date*, or *category*.

Keys:

* type: ‘data\_type’
* expected: : <’string’, ‘number’, ‘date’, or ‘category’>

```
type: 'data_type'
expected:
    'metric': 'number'
    'target': 'number'
```

### Pattern

Check if string values correspond to a defined pattern

Keys:

* type: ‘pattern’
* expected: pattern/regex as a string
* params: object with a *columns* key: the list of columns to check.

```
type: 'pattern'
expected: '[0-9]+'
params:
    columns: ['metric']
]
```

### Not null

Check if some columns don’t have null value.

Keys:

* type: ‘not\_null’
* params: object with a *columns* key: list of columns.

```
type: 'not_null'
params:
    columns : ['zone']
```

**Tutorial : Product Corporation**

> You can download the CSV file for our tutorial.
>
> [data-product-corporation.csv](https://github.com/ToucanToco/doc_example/blob/master/data_sources/data-product-corporation.csv)
>
> * Add validation for your datasource in etl\_config.cson
>
>   ```
>   DATA_SOURCES: [
>    {
>      domain: 'data-product-corpo'
>      file: 'data-product-corporation.csv'
>      skip_rows: 0
>      separator: ','
>      encoding: 'utf-8'
>      type: 'csv'
>      validation:[
>     type: 'rows'
>     expected: 46
>      ,
>     type: 'columns'
>     expected: ['brand', 'country', 'product_line']
>      ]
>    }
>   ]
>   ```
> * Drag and drop your new etl\_config.cson in the `CONFIG FILES` page
> * Go to your ‘DATA SOURCES’ page and drop your datasource.
> * Validation should be ok. If not, the file is not uploaded.
