# Set validation rules

Add some data validation steps to detect errors in your datasources sooner.

When a new data file is dropped through the studio or extracted during preprocess, it can be validated. You can check the data types of the columns, the number of rows of a data set, the presence of some required columns, that there is no duplicated rows, etc.

The list of validation rules for data files are defined inside the *etl\_config.cson* file under the *validation* key.

### Example

```
domain: "market-breakdown"
type: "excel"
file: "data-market-breakdown.xls"
sheetname: "Data"
validation:[
  type: "data_type"
  expected:
    "breakdown": "string"
    "pays": "string"
    "part": "number"
    "clients": "number"
,
  type: "pattern"
  expected: "[0-9]+"
  params:
    columns: ['metric']
]
```

### Rows

Check the number of expected rows.

Keys:

* type: ‘rows’
* expected: (number) number of expected rows

### Columns

Ensure that a given list of columns is a subset of the dataset’s columns

Keys:

* type: ‘columns’
* expected: (list(str)) columns you expected to find

```
type: 'columns'
expected: ['metric', 'target']
```

### Unique values

Ensure that the list of unique values of a given column corresponds exactly to a list of expected values.

Keys:

* type: ‘unique\_values’
* expected: (list) unique values
* params:
* column: (string) column name

```
type: 'unique_values'
expected: ["France", "Germany", "Spain"]
params:
    column: 'target'
```

### No duplicates

Duplicated rows can be assessed based on all the columns or only a subset of columns.

Keys:

* type: ‘no\_duplicates’
* params:
* columns: (list or string) list of columns to use or ‘all’

```
type: 'no_duplicates'
params:
    columns: ['metric']
```

### Value

Check the value of a column (one value only). If the query returned more than one row, only the first one will be used.

Keys:

* type: ‘value’
* expected: (string or number) expected value
* params:
* column: (string) in which to check the value

```
type: 'value'
expected: 30
params:
    column: 'target'
```

### Data type

Check column data types. Three possible types: *number*, *string*, *date*, or *category*.

Keys:

* type: ‘data\_type’
* expected: : <’string’, ‘number’, ‘date’, or ‘category’>

```
type: 'data_type'
expected:
    'metric': 'number'
    'target': 'number'
```

### Pattern

Check if string values correspond to a defined pattern

Keys:

* type: ‘pattern’
* expected: pattern/regex as a string
* params: object with a *columns* key: the list of columns to check.

```
type: 'pattern'
expected: '[0-9]+'
params:
    columns: ['metric']
]
```

### Not null

Check if some columns don’t have null value.

Keys:

* type: ‘not\_null’
* params: object with a *columns* key: list of columns.

```
type: 'not_null'
params:
    columns : ['zone']
```

**Tutorial : Product Corporation**

> You can download the CSV file for our tutorial.
>
> [data-product-corporation.csv](https://github.com/ToucanToco/doc_example/blob/master/data_sources/data-product-corporation.csv)
>
> * Add validation for your datasource in etl\_config.cson
>
>   ```
>   DATA_SOURCES: [
>    {
>      domain: 'data-product-corpo'
>      file: 'data-product-corporation.csv'
>      skip_rows: 0
>      separator: ','
>      encoding: 'utf-8'
>      type: 'csv'
>      validation:[
>     type: 'rows'
>     expected: 46
>      ,
>     type: 'columns'
>     expected: ['brand', 'country', 'product_line']
>      ]
>    }
>   ]
>   ```
> * Drag and drop your new etl\_config.cson in the `CONFIG FILES` page
> * Go to your ‘DATA SOURCES’ page and drop your datasource.
> * Validation should be ok. If not, the file is not uploaded.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs-v3.toucantoco.com/data-management-in-datahub/datasets-in-toucan/maintaining-data/set-validation-rules.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
