In today’s post we’re going to walk through how to Pivot a data table in knime. The data table we’ll be working with today is below:
As you can see, the data contains weekly pay values by year and by department. Originally, this data was recorded at the daily level, but we aggregated up to a weekly view in this post on using the Grouby Node.
Our goal in this post will be to pivot the year, so each year’s pay value is in its own column.
To get started, search “Pivot” in your node repository. You should see the Pivoting node show up. Once you find it, you can drop onto your workflow and connect it to your source data.
Now we’re ready to configure the node. You can double click into the node and you should see the following menu:
The 3 tabs I boxed in blue are the ones we’re going to configure to get our pivot to work correctly. The yellow box provides additional flexiblity in the output of our Pivoting Node. The selection you make for Column name and Aggregation name will directly affect the output of the Pivoting node.
Below is some detail on the 3 tabs that are boxed in blue:
Now that we’ve got a better understanding of the configuration menu, let’s get to work and pivot our data table!
First up, we’re going to configure the Groups tab. We want to keep the department values as rows, so we’ll let knime know by adding the Department column into the Groups Columns window, like below:
Next, we’re going to configure the Pivots tab. All we’ve got to do here is move the column we want Pivoted into the Green Pivot Column(s) window like below.
Finally, we’ll configure the Manual Aggregation tab. We’ll have to pay extra attention here as this tab has a bit more functionality than the other two.
The values we want to display under each of the pivoted years are the Weekly Pay values. We know that each year and department has a unique Weekly Pay tied to it, this is important to keep in mind since it will help us figure out how we tell knime to display the weekly pay values under each pivoted column.
Let’s add the Weekly Pay value into the “Select” window of the Manual Aggregation tab. Your tab should match the below:
The first thing that stand out here is that there are a few options next to the newly added Weekly Pay column. These options include an Aggregation option, a Missing values option, and a Parameter option. For our intents and purposes, we only need to focus on the aggregation option.
Clicking on the Aggregation option will show you a list with all the different options available. Some options are more relevant when a particular column and row intersection have more than one value available. For example, if the HR department had more than one row/Weekly Pay value for the Year 2014, then using a sum or a mean or a first aggregation option will have a different outcome than it would if there was only one Weekly Pay under HR and 2014.
We know that our table only has one Weekly Pay value for each Department and Year combination. Given that, we can set the Aggregation option for Weekly pay to Mean or to First or to Sum and the value will in the pivot table will be the same regardless of which option we choose. I typically run with the first option, just as a habit I’ve gotten used to over time, but it doesn’t matter which you run with in this example.
The last step, which is more a luxury than a necessity, it to set the yellow boxes to keep the original column names. I think its cleaner to keep the original column names than it is to add the aggregation method into the column names. If you run multiple aggregations of the same column (for example, showing the mean and median of the value), then you can’t keep the original column name as you need to be able to distinguish one aggregation from the other.
This is what our menu should look like now:
Our node is fully configured at this point. Go ahead and click apply, ok, and then execute the node.
Now, let’s right click on the node and select the “Pivot Table” option to see the result of our work. Our Pivot table should look like this:
We’ve successfully pivoted our data table! That’s all there is to it. The most important thing to keep in mind, when using the Pivoting node, is the structure of our data. We need to know whether each intersection/combination of data has one value or multiple values available – this fact directly affects that Aggregation option we run with in the Manual Aggregation tab.
Like always, if you need anything clarified or have any additional questions, shoot me a DM on Twitter @Cest_Nick ; Don’t forget to share this post with your analyst friends.
Catch you guys on the next post!