Join datasets

The Join step allows you to combine two datasets listed in the dataHub to bring columns from the former into the latter, matching rows based on columns correspondance

Step parameters

Select a dataset to join (as a right dataset) column(string)*: Select a dataset to join as the right dataset.
Select a join Type dropdown(string)*: Choose from "left", "inner", or "left outer" join.
- left: will keep every row of the current dataset and fill unmatched rows with null values,
- left outer :
- inner: will only keep rows that match rows of the joined dataset.
Join based on columns:specify 1 or more column couple(s) that will be compared to determine rows correspondance between the 2 datasets. The first element of a couple is for the current dataset column, and the second for the corresponding column in the right dataset to be joined. If you specify more than 1 couple, the matching rows will be those that find a correspondance between the 2 datasets for every column couple specified (logical ‘AND’).

Example

Input

Configuration

{
    "right_pipeline": "dataset_to_join",
    "type": "left",
    "on": [
        {
            "id": "emp_id"
        }
    ]
}

Output

Last updated 3 months ago

Was this helpful?