Crosstab
A Crosstab is a table of statistics that summarizes the data of a more extensive table. The summary may include sums, averages or other statistics, which the Crosstab groups together in a meaningful way.
When to use
- A Crosstab can help in summarizing the data and to highlight the information.
- To create columns for each category in a row.
Quickstart
Let us start with the following data:
Student | Subject | Category | Marks |
---|---|---|---|
Alice | Physics | Science | 85 |
Alice | Chemistry | Science | 55 |
Alice | English | Language | 60 |
Alice | French | Language | 78 |
Bob | Physics | Science | 75 |
Bob | Chemistry | Science | 60 |
Bob | English | Language | 65 |
Bob | French | Language | 80 |
It is important to identify the required end result. In this example, let us ask "How many marks did each student score per category?"
To answer this question, complete the following steps in Mammoth application:
- Go to Transform > Reshape, Group & Aggregate.
- Select the Crosstab option.
- Drag and drop the
Student
column into theRows
bucket. - Drag and drop the
Category
column into theColumns
bucket. - Change the aggregation from
Count
toSum
ofMarks
. - Click APPLY.
This produces a new dataset. The dataset is created in the same folder.
The new dataset appears as shown below:
Student | Science | Language |
---|---|---|
Alice | > 140 | 138 |
Bob | 135 | 145 |
Note: We could use the Group data task to achieve the same results. It will be presented differently.
Supported Options
The following options are supported within the Crosstab task:
Rows
: The columns that produce the rows in the Crosstab. When multiple columns are chosen, the result contains possible combination of the values in selected columns.Columns
: The columns that produce the columns in the Crosstab. When you choose multiple columns, the result will have one column for each possible combination of the values in the selected columns.Aggregation
: The aggregation that needs to produced in the summary. See functions.Dataset Name
: The name of the new dataset that is created with this operation. This name is a suggestion and gets modified by the system if you have datasets with the same name already.
Apart from the options listed above, an option to tweak the working of task when data is refreshed is also provided. See what happens when pipeline reruns.
What happens when pipeline reruns
See On updates in the Branch out to Dataset documentation.
- There is a limit of 400 columns in the result dataset.
- Editing this task can result in some unexpected behavior when you change the configuration too much. It is best to delete a crosstab task and create a new task if the configuration is going very different from the previous one.
- The date columns can be summarized further within the in Rows/Columns section. You can extract specific components from date columns before summarizing.
- System places the new dataset in the same folder as the parent dataset.