How to Filter Data in the Knime Analytics Platform

January 07, 2021

How to Filter Data in Knime

In today’s blog post we will walk through how to filter a data table in knime. This post is specific to the rule-based row filter node, which give us flexibility to fitler on multiple criteria. Let’s get started.

Today we’ll be working with the below pictured data table. The data table we’re working with is below, it has records of varying Auto Make & Models from different years. We have two goals today, the first is to filter the table for only SUVs and the second is to filter for only trucks from 1997.

To filter data we will need to use the Rule-based Row Filter node. Search in the Node Repository for the node, then drag and drop it onto your workspace and connect it to the data table node.

Now that we’ve got the node connected, double click so we can start to configure the rules we want to filter by. Below is what the configuration menu for the Rule-based Row Filter in Knime looks like.

The most important section here is the Expressions window because this is where we will want our formula/code that lets knime know how we want our data table filtered. I boxed the Expressions window in blue in the below image.

The section in the red box holds the columns in our data table as well as the different functions that we can use to determine how we want to filter our table. These end up in the Expressions window and let knime know exactly how we want the data filtered, that’s why the Expression window is the most important. The section in the purple box gives us the flexibility to choose whether we want matching/TRUE values to be included or excluded from the filtered table.

To accomplish our fist goal (filter the table for only SUVs ) we will want to click into the Expression window, so that we see the cursor flashing in there. Next we will double click on the “Type” column name. This double click should automatically place the Type column name in the Expression window. This functionality saves us time from typing in the actual column name and it also helps ensure that the column names we reference in the Expression window are spelled correctly. Next we will double click on the “? =?”, this will drop an = sign into the Expression window. Next, we will type “SUV” right after the = sign. The Type column is a String, so we add double quotes around the string we want to match on. Our formula now reads as the screen shot below.

The last piece is to add the following syntax after our rule: ” => TRUE”

What this does is that it assigns any row that matches our rule with a TRUE outcome. Since we have the purple box set on the default aka to “Include TRUE matches”, this means any row that matches this rule will have a TRUE outcome and only TRUE outcomes will be included in the output table. Likewise, if we selected the option that reads “Exclude TRUE matches”, then our output table would only include vehicle Types that aren’t SUVs.

Our final formula/syntax looks like the below.

This is the final rule / formula / syntax needed to filter our data tabe in Knime

Now all we need to do is click Apply, Ok, and Execute the node. You can execute the node by right-clicking and selecting execute or by hitting the F7 button when you have the node that you want to execute selected. Now that we’ve executed the node, we can take a look at the filtered table by right clicking on the node and selecting the option that reads “Filtered”.

Selecting the Filtered Data table in Knime

Below is what our successfully filtered data table looks like:

Our second goal is to filter the data table for trucks from the year 1997, we’re essentially filtering on multiple conditions. We’ll double click into the Rule-based Row Filter once again. We’ll change the current formula/rule we have in there from Type = “SUV” to Type = “Truck”. Now, we’re going to add the second condition where we tell Knime that we want to filter for year = 1997. Right in between Type = “SUV” and the => symbol we will add the following:

Filtering Data in Knime based on Multiple Conditions

Notice that for the year we don’t need quotes around the 1997 – this is because the values under the year column are intergers. Numbers/columns that are formatted as a Double don’t need quotes either.

Now that we’ve got our rule/formula set, we can click apply, ok, and execute the node. Below is what our successfully-filtered-on-two-conditions table looks like:

You can see we’re only showing the rows where the vehicle Type matches Truck and the Year is 1997.

As an added bonus, I’m going to show you a few more filtering rules and their corresponding outputs. Hopefully, these give you additional ideas to the flexibility with which you can filter data tables in Knime. The only type of filtering that this Rule-based Row Filter node won’t allow us to do is filtering based on/by dates. There is a node dedicated specifically to filtering by/on dates. You can check out the how-to for that node in the following LINK.

If you have any further questions on the Rule-based Row Filter node, if you need anything else clarified or just want to drop me a line, you can do so by DMing or tweeting me. My handle/username is @Cest_Nick ; otherwise, enjoy the rest of the examples below!

More Examples of Filtering Data Tables in Knime

Filtering Data Tables with AND/OR Conditionals

Below we combine and AND argument along with an OR argument. We want all rows matching Type = Truck AND Year 1997 OR all rows matching Make = Ford

Rule Based Row Filtering in Knime with AND + OR conditions

Below is the same table, but notice how we structured the rules/formulas a bit differently. Where above we had the rule combined all in one line, in the below we have each rule separated out as it’s own line. This is a bit easier to manage when you have multiple rules and want to easy include/exclude rules and make changes on the fly. A quick look at the Row IDs makes it easy to see that the outcomes are the same despite the rules being setup a bit differently!

Rule Based Row Filtering in Knime with multiple rules

Filtering Data Tables based on Matching Conditions in a List or Array

Below we filter the data table for only rows where the Make matches Ford or Dodge. You can scan the make column (boxed in blue) and you’ll see that all rows match either Ford or Dodge! The items we want to match on need to be wrapped in double quotes (since they’re strings), must be encapsulated in parenthesis, and need to be separated by commas.

Filtering Data based on lists in Knime

The below table has the same exact rule as the above table, but the difference is that in the below we configured the output to Exclude TRUE matches, as you can see in the red box. While our rule remained the same, we easily switched the settings so we get the inverse out of what we originally wanted – have a mentioned how flexible Knime can be? All vehicle Makes in the output now are anything but Ford or Dodge.

I hope you enjoyed this write-up!

Cheers,

Nick

Book a call to explore the different avenues the Analytic Iron team can help You extract insights from Your Data.
BOOK a Discovery cALL