Toucan 3.0
YouPrep documentationHelp centerGet a demo
  • Welcome
    • 👋Welcome to Toucan
    • ⚙️Technical resources
      • ⚙️Toucan stack
      • Setup mode
        • Toucan SaaS mode
      • ⚙️Security
        • Application Security
        • Source Code Quality
        • Global Security Practices
        • Security of Docker Images
  • TUTORIALS
    • 📊Getting Started : Embedded Analytics
    • 🤓Advanced tutorials
      • Embedding a story with user attributes
        • Dynamic filter with user attributes
        • Dynamic Tables
        • Dynamic Database
        • Dynamic Host
      • Using the HTTP API connector in advanced use cases
      • Using advanced syntax for SQL queries
      • Merging filters with our tool
      • Deep customization chart (CSS)
        • Homepage customization
        • Chart customization
        • Dashboard customization
  • Data Management
    • 🧮Overview of Data In Toucan
    • 📡Datasources in Toucan
      • 🔌Managing Connectors
        • 🔌Creating, editing and deleting a connector
        • 🔌Set up OAuth2 credentials for your platform
        • 🔌Setting up a connector
          • 🔌Generic Connectors
            • 🔌Setting up an HTTP API connector
            • 🔌Setting up an ODBC Connector
          • 🔌Database and data warehouse Connectors
            • 🔌Setting up an AWS Redshift Connector
            • 🔌Setting up a Snowflake Connector
            • 🔌Setting up a PostgreSQL Connector
            • 🔌Setting up a Google Big Query Connector
            • 🔌Setting up an AWS Athena connector
            • 🔌Setting up a MySQL connector
            • 🔌Setting up a MongoDB connector
            • 🔌Setting up a Microsoft SQL Server connector
            • 🔌Setting up an Azure SQL connector
            • 🔌Setting a Databricks Connector
            • 🔌Setting up a ElasticSearch Connector
            • 🔌Setting up a Clickhouse Connector
          • 🔌Online services connectors
            • 🔌Setting up a Sharepoint Connector
            • 🔌Setting up a Google Sheets Connector
            • 🔌Setting up a Salesforce Connector
            • 🔌Setting up a Hubspot Connector
          • 🔌Setting up an AWS S3 connector
      • 📁Managing Files
        • 📁Adding, editing and deleting local files
        • 📂Using advanced file settings
        • 📁Adding and combining remote files in Toucan
    • 🔢Datasets in Toucan
      • 🔢Stored and Live Datasets
      • 💿Managing datasets
        • 🔢Creating datasets
        • 🔢Editing, Duplicating and Deleting a dataset
        • 🔢Refreshing and Publishing Datasets
        • 📈Optimize data performance
        • 🗂️Adding indexes to stored datasets
        • 👩‍💻Code mode and single mode
      • 🛑Setting permissions on dataset
      • 🗃️Maintaining Data
        • 🗃️Tagging datasets
        • 🗃️Identifying datasets dependencies
        • 🗃️Set validation rules
    • 🧑‍🍳Preparing data
      • Overview of YouPrep™
        • 🎹Column header
          • Rename column
          • Duplicate column
          • Fill null values
          • Replace values
          • Sort values
          • Convert columns data types
        • Add
          • Add text column
          • Add formula column
          • Add conditional column
        • Filter
          • Delete columns
          • Keep columns
          • Filter rows
          • Top N rows
          • ArgMax
          • ArgMin
        • Aggregate
          • Group by
          • Add total rows
          • Hierarchical rollup
          • Get unique groups/values
        • Compute
          • Compute evolution
          • Cumulated sum
          • Percentage of total
          • Rank
          • Moving average
          • Compute statistics
          • Absolute value
        • Text
          • Concatenate
          • Split column
          • Extract substring
          • To lowercase
          • To uppercase
          • Compare text columns
          • Trim spaces
          • Replace text
        • Date
          • Convert text to date
          • Convert date to text
          • Extract date information
          • Add missing dates
          • Compute duration
        • Reshape
          • Pivot
          • Unpivot
          • Waterfall
        • Combine
          • Append datasets
          • Join datasets
        • Geo
          • Geographic dissolve
          • Geographic hierarchy
          • Geographic simplify
          • Prepare geo data (with basemap)
      • YouPrep™ Native SQL
      • Hybrid pipeline
    • ➿Managing variables in Toucan
      • ➿Variables hub
      • ♈Use variables in YouPrep™
      • ➿Easy reference to variables
    • 🧞Using advanced data concepts
      • 🧞Data personnalisation with user attributes
        • Connector setup with a user attribute
        • Database selection with a user attribute
        • YouPrep data filtering with a user attribute
        • Filter data in SQL with a user attribute
      • 🧞Advanced syntax for variables
      • 🧞Data cache
  • Visualizations and Layouts
    • 📺Apps
      • 📺Managing Apps
        • ➕Creating Apps
        • 📄Duplicating Apps
        • 🖨️Publishing Apps
        • 🚮Deleting Apps
        • ✍️Editing within an App
      • 🖌️Customizing Apps
        • Customizing chart color elements
        • Customizing the app's font
        • Adding Assets
        • Adding Glossary
        • Setting up, Managing and testing custom visibilities
        • Customizing the "no data error" message
        • Creating a dynamic background based on an Filter's column
      • 🏠Home
        • Creating the Home
        • Creating Tiles
          • Tile Dynamic Value
          • Tile Leaderboard
          • Tile Line
          • Tile Scorecard
          • Tile Bullet
          • Tile Heatmap
          • Tile PDF
          • Tile Video
          • Tile Image
          • Tile Text
          • Tile HTML
          • Tile Separator
      • ✨Stories
        • Creating a Story
        • KPIs
        • Narrative
        • Crossfilter
      • 📽️Filters
        • Managing Filters
          • Creating, reusing and editing Filters
          • Applying Filters
          • Unpinning and deleting Filters
        • Type of Filters
          • Dropdown
          • Checkboxes
          • Buttons
          • Date Range
          • Hierarchical
          • Slider
        • Templating from Filters' values
        • Dependant Filters
      • 📈PDF Report
      • 🎡Datawall
      • 🏗️Dashboard Builder
        • Create a Dashboard Builder
        • Embed a Dashboard Builder
        • Dashboard export options
      • 🌟MyFavorites
    • 📊Creating Visualizations
      • 🤩Viz Gallery
        • Barchart
        • Barlinechart
        • Bubblechart
        • Bulletchart
        • Circularchart
        • Funnelchart
        • Gantt chart
        • Heatmap
        • HTML
        • Leaderboard
        • Leaderboard Centered Average
        • Linechart
        • Mapchart
          • Configure a drill
        • Mediachart
        • Radarchart
        • Tablechart
        • Timeline
        • Versuschart
        • Waterfallchart
        • Score Card
        • Stacked Barchart
      • 🧠Common Chart Configuration
      • 💅Customizing chart colors
      • 🧞‍♂️Advanced chart configuration
        • Templating from chart's dataset
        • Add units, precisions and sentiments
        • Adding Tutorials
        • Add sparklines
        • Navigate with stories
        • Group informations in your stories
        • Multiple charts in one story
        • Manage dates
        • Customize tiles' sources
        • Add stars to tiles' title
        • Manage data order in your tiles
        • Navigate with tiles
    • 👩‍💻Embedding
      • 🔐Authentication
      • 🖇️Integration
        • Generate and manage embeds
        • Customize embeds
        • Embedding a Toucan App Using iFrames
        • Passing Extra Variables to Your Toucan Embed
      • ⚙️Embed SDK
        • Embed SDK Authentication
      • ❓FAQ
    • 🙋Self-Service
      • Self-Service Dashboard
      • Self-service PDF Report
  • Collaboration
    • ⏰Creating alerts
    • 📧Managing notifications
    • ➕Enriching a story with descriptions
    • 💌Sharing content
    • 💬Adding comments to stories
  • Administration
    • Page
    • ⚙️Instance Management
      • ⚙️Managing operations in SaaS
      • ⚙️Customizing your instance (whitelabel)
    • 👥Managing Users
      • 👥Users
      • 👥Managing user groups
      • 👥Managing user properties
      • 👥Setting up permissions and visibilities
    • 🌐Managing languages in Toucan (internationalisation)
    • 📈Monitoring Engagement with User Analytics
      • 🎛️How to Filter your User Analytics?
      • 🖥️Understanding your User Analytics Dashboards
  • Additional Ressources
    • 📚External documentation
    • 🚁Support for App-builders
    • 🆕Latest releases
      • 🎁2025 Releases
      • 🎁2024 Releases
      • 🎁2023 Releases
    • 🔧Troubleshooting
      • Troubleshoot:: DataHub
      • Cross-Site Cookies
      • How to :: read the inspector error
      • How to :: troubleshoot the toucan way
Powered by GitBook
On this page
  • Example
  • Rows
  • Columns
  • Unique values
  • No duplicates
  • Value
  • Data type
  • Pattern
  • Not null

Was this helpful?

  1. Data Management
  2. Datasets in Toucan
  3. Maintaining Data

Set validation rules

Add some data validation steps to detect errors in your datasources sooner.

When a new data file is dropped through the studio or extracted during preprocess, it can be validated. You can check the data types of the columns, the number of rows of a data set, the presence of some required columns, that there is no duplicated rows, etc.

The list of validation rules for data files are defined inside the etl_config.cson file under the validation key.

Example

domain: "market-breakdown"
type: "excel"
file: "data-market-breakdown.xls"
sheetname: "Data"
validation:[
  type: "data_type"
  expected:
    "breakdown": "string"
    "pays": "string"
    "part": "number"
    "clients": "number"
,
  type: "pattern"
  expected: "[0-9]+"
  params:
    columns: ['metric']
]

Rows

Check the number of expected rows.

Keys:

  • type: ‘rows’

  • expected: (number) number of expected rows

Columns

Ensure that a given list of columns is a subset of the dataset’s columns

Keys:

  • type: ‘columns’

  • expected: (list(str)) columns you expected to find

type: 'columns'
expected: ['metric', 'target']

Unique values

Ensure that the list of unique values of a given column corresponds exactly to a list of expected values.

Keys:

  • type: ‘unique_values’

  • expected: (list) unique values

  • params:

  • column: (string) column name

type: 'unique_values'
expected: ["France", "Germany", "Spain"]
params:
    column: 'target'

No duplicates

Duplicated rows can be assessed based on all the columns or only a subset of columns.

Keys:

  • type: ‘no_duplicates’

  • params:

  • columns: (list or string) list of columns to use or ‘all’

type: 'no_duplicates'
params:
    columns: ['metric']

Value

Check the value of a column (one value only). If the query returned more than one row, only the first one will be used.

Keys:

  • type: ‘value’

  • expected: (string or number) expected value

  • params:

  • column: (string) in which to check the value

type: 'value'
expected: 30
params:
    column: 'target'

Data type

Check column data types. Three possible types: number, string, date, or category.

Keys:

  • type: ‘data_type’

  • expected: : <’string’, ‘number’, ‘date’, or ‘category’>

type: 'data_type'
expected:
    'metric': 'number'
    'target': 'number'

Pattern

Check if string values correspond to a defined pattern

Keys:

  • type: ‘pattern’

  • expected: pattern/regex as a string

  • params: object with a columns key: the list of columns to check.

type: 'pattern'
expected: '[0-9]+'
params:
    columns: ['metric']
]

Not null

Check if some columns don’t have null value.

Keys:

  • type: ‘not_null’

  • params: object with a columns key: list of columns.

type: 'not_null'
params:
    columns : ['zone']

Tutorial : Product Corporation

You can download the CSV file for our tutorial.

  • Add validation for your datasource in etl_config.cson

    DATA_SOURCES: [
     {
       domain: 'data-product-corpo'
       file: 'data-product-corporation.csv'
       skip_rows: 0
       separator: ','
       encoding: 'utf-8'
       type: 'csv'
       validation:[
      type: 'rows'
      expected: 46
       ,
      type: 'columns'
      expected: ['brand', 'country', 'product_line']
       ]
     }
    ]
  • Drag and drop your new etl_config.cson in the CONFIG FILES page

  • Go to your ‘DATA SOURCES’ page and drop your datasource.

  • Validation should be ok. If not, the file is not uploaded.

Last updated 1 year ago

Was this helpful?

🔢
🗃️
🗃️
data-product-corporation.csv