Today post is a follow-up on the basic table splitting tutorial that’s available here. In that post we split our data table using one condition. In today’s follow-up we’re going to split the table using multiple conditions. Let’s get started!
Our two goals for today are to split the data table based on the below:
Notice how our goals are only slightly different to those in the the last post, we’ve only added in an additional condition/criteria to each goal. Since the Row Splitter node only allows us to split data based on one condition, we’re going to have to switch over to a more flexible node – the Rule-based Row Splitter node.
Type “Row Splitter” in the Node Repository and select the Rule-based Row Splitter node and drag and drop it into your workflow.
Once we’ve got the Rule-based Row Splitter connected, we can being to configure. This is what the configuration menu looks like.
The big difference here is that we’ll have to type in some syntax to get the node to split our data table properly. Don’t worry though, the syntax is rather straight-forward to follow.
The basic structure of the syntax is:
After Knime works each row through the syntax, it labels the row as either TRUE or as FALSE and then splits the table accordingly. The bullets beneath the expression window give you the ability to choose whether TRUE matches output in the top or the bottom output – which the menu describes as the first and second output tables.
The rest of the menu is described as follows:
The beauty of these windows and available lists is that we’re allowed to double click on them and add them into the syntax (aka Expression) that become our rules. This saves us typing time and helps us keep moving with pace when we’re in the zone! Give it a try, double click on any of the values in the 3 windows, and notice how they fill into the Expression window.
Lets get started putting together the expression that will help us split the table according to the goals we set at the beginning of the post. Goal 1 is to split Truck vehicle types that are model Ford into their own tables. Our expression should look like the below.
I added an underline to break down the different pieces of this expression. The two conditions that we want to split the table on are underlined in blue and in red. Notice all we’re saying here is that we want column $Make$ to equal Ford as well as column $Type$ to equal Truck. The $ signs tell knime that we’re referring to a column name while the double quotes around the Ford and Truck let knime know that we’re looking to match a String value. Underlined in green is the AND operand. This tells knime that we are looking for a match of criteria for both rules. Finally, we use “=> TRUE” to let knime know that when both rules are met, set the row as TRUE. The TRUE is then either fed to the top or bottom table, depending on which of the bubbles we have selected right below the Expressions window.
Now that we’ve got our expression setup, we can click on apply and OK and then execute the node. The data in the top output of our node should look like this:
We can audit the original data table, and we’ll see that Ford only has two rows that are type Truck, so our data split correctly and we’ve accomplished goal #1. The data in the bottom output of the node should hold all the data that didn’t match Type = Truck and Model = Ford.
Goal #2 is to split the data so that all SUV’s model year 2000 and newer are in their own table. Given that goal, our expression should look as follows.
The two conditions we want to match here are underlined in blue and red. The AND operand lets knime know we want to match both conditions and the => TRUE sets rows that match both conditions to TRUE. What’s slightly different in this expression from the previous is that we’re working with a column that has integer values (not a strings) within the column. When we delineate a numbers (be it an interger or a double) we do not need to add double quotes around the numbers. Double quotes are only needed for columns of String type.
The last difference here is that we also switched up the output port we want the TRUE matches to show up at (purple box). Previously we had TRUE matches to the top output of the node aka the first output table. This time around, we’re going to set the TRUE matches to go to the bottom output of the node aka the second output table. This is simply to highlight the flexibility of the node, its up to your preference which output the TRUE matches (aka the rows meeting all our criteria) end up at.
Once we hit apply, OK, and execute the node, we can take a look at the results. The data in the bottom output of the node should look like this:
This table only has SUVs that are model Year 2000 and newer. We’ve successfully split the table! The top output should have all the the other data that didn’t match the criteria we set up.
That’s the end of this post. I do have another posting with bonus material where I show how to use some of the other functions available in the Function window. I’ll link that post here once it’s done.
As always don’t forget to share this content with any of your other analyst friends. If you need anything clarified or need additional help, don’t hesitate to shoot me a DM on Twitter @Cest_Nick ; otherwise, I’ll catch you on the next post!