# Group by

The Group by step allows you to group your data by one or more columns and perform calculations on other columns. This step is useful for summarizing data and creating reports.

### Step parameters

2. `Group rows by` **column(array)\***: Select one or more columns that will be used to constitute unique groups. For example, you might group by "product" or "category".
3. `And aggregate...` **array(aggregation)\***: Define one or more aggregations to perform on your grouped data. For each aggregation, you need to specify:
   * `Columns`: **column(array)\***: the columns to be aggregated (you can apply the same aggregation function to several columns at once)
   * `Function` **(string)\***: the aggregation function to be applied (`sum`, `avg`, `count`, `min`, or `max`)
4. `Keep Original Granularity` **(boolean)**: whether to keep the original granularity, in that case computed aggregations will be added in new columns. If unchecked, the output will only contain the grouped and aggregated data
5. `Count null values like regular values` (boolean): Select whether to include `null` values in the count. If checked, `null` values will be counted as regular entries.

### Example

**Input**

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-46dea441060a6d51c6c4213e1f3ae0d82589f6a2%2Fgroupby_output.png?alt=media" alt=""><figcaption><p>Aggregate - group by input</p></figcaption></figure>

**Configuration**

```json
{
    "on": []
    "aggregations": [
        {
            "columns": [],
            "aggfunction": ""
        },
        {
            "columns": [],
            "aggfunction": ""
        }
    ]
    "keep_original_granularity": false 
}
```

**Output**

<figure><img src="https://1809014303-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZxYYf1KpgarKMgMsDCrw%2Fuploads%2Fgit-blob-b6a3d8d716cbf457e3999ed2bcbbe516e7ed70e5%2Fgroupby_input.png?alt=media" alt=""><figcaption><p>aggregate - group by output</p></figcaption></figure>

{% hint style="info" %}
If an aggregation function is applied once in a column, the output column will replace the aggregated column with the same name.

If it's applied twice or more on the same column, the aggregated columns will be named `column_name-aggfunction`

For example, if you compute an aggregation on a sales column for sum and average, you will have a column named `sales-sum` and another one titled `sales-avg`
{% endhint %}
