In today’s post we’re going to go over how to filter out and remove columns from your data tables. Filtering columns from your data is a fundamental skill for business analyst, data analyst, data scientists and everyone in between. Whether you need to drop a column that’s no longer needed or a column that never really had any useful detail, today we’ll learn how to filter columns out from your data set using the Knime analytics Platform.
Let’s get started!
If you’ve followed our video or blog post on how to unpivot data, then you might have noticed that the RowID column also unpivoted. We don’t really have any valuable data in that column, so we’ll use that table to filter out the redundant RowID column.
To filter out columns from data tables in Knime, we’re going to have to use the Column Filter node. If you type Column Filter in the node repository, you should see the node pop up. Drag and drop the node onto your workflow and connect it to the data table that we want to filter columns from.
Once you’ve got the node connected, you can double click into it to start the configuration. The configuration screen for the Column Filter node should look like the below.
The menu is relatively straight forward for the goal we’re looking to achieve today – to filter out the RowIDs column. The columns that are in the green section are the columns that will be included in the output data table, while the columns in the red section are those that will be filtered out / removed from the output data table.
For our goal we simply need to double-click on the RowIDs column so that it moves from the green section to the red section. After we’ve got that done, then we can click apply, ok, and execute the node. The final output table should look like this:
That’s all there is to it! We’ve successfully filtered out a column from our data table.
While we used Knime’s column filter node in a rather straight-forward situation, the column filter node does offer additional flexibility for filtering in less straight-forward situations. For those less than simple situations, you might have noticed the three bullet selections above the red & green windows. I’ve highlighted the bullets in yellow in the below screen shot.
These bullets allow us to filter columns thru a manual selection, thru regex mechanisms, and thru column type criteria. The manual selection bullet is the one we just used, the straight-forward one. The other two selections are much more useful for more complex column filtering. Some complex filtering scenarios include:
For example, when you run a loop on your data that appends new columns to your data set, then we could use the wildcard/regex option to filter out unwanted iteration columns. Or,
For example, if we only want to keep numeric value columns because we want to run a PCA analysis, then the type selection option would work best.
Stay tuned for more detail on these two other filtering options and how to use them in your work, I’m working on another post for that. I’ll link that post here once it’s live!
I hope this post helped you learn how to filter columns from your data in Knime. As always, if you have any questions or need anything clarified, don’t hesitate to reach me via DMs on twitter (@cest_nick). Don’t forget to share this post with any of your friends or colleagues that might find it helpful!
-Nick