Today we’re going to walk through how to split a data table based on meeting a condition within a column. We’ll be working with same data from our post on filtering data, which you can find at this link.
Our two goals for today are to split the data table based on the below:
To get started we’re going to search in the Node Repository for the Row Splitter node.
Once you’ve got the node onto the workspace and connected to the data table, the next step is to double click into the node so that we can begin the configuration. Below is a screen shot of the configuration menu you’ll see once you double click into the Row Splitter node.
What you’ll see here are a few different options offered by the node. In the red box at the top is where we select the column where we want to match the criteria that will split our table into two outputs. The box in yellow, to the left has the filtering options where we tell Knime whether we want the matching rows output in the top output of the node or if we want the output in the bottom output of the node, see my artistic graphic below. The Boxes in purple are where we select the type of rule we want to split the tables on – be it string matching, using a numerical range, or specifically targeting matching missing values. I’ll explain the purple boxes in a bit more detail below.
The 3 purple boxes are where we select the type of rule we want to split the table on, below is a bit more detail:
Pattern matching can be used for splitting data tables based on string/text matching. If you want to split a table based on a particular text condition, then this is the section you will use to delineate that. You simply fill in the String/Text you want to match/split on in the blank input box thats directly beneath the “use pattern matching” bullet. You’ll see this at work when we accomplish goal 1 a little later in this post.
This section is where we delineate numerical values that we want to split the table on. The flexibility is already built in to allow for splitting for Between values, in other words splitting by a range, because the menu offers and input for a lower bound as well as an upper bound. If we don’t want to split by a range, all we need to do if provide a value in the lower bound input. This would split by below as true values and above as false values. Likewise, if we only provide an upper value to split on, then the values above or equal to are output as true while below are output as false. We’ll use this section to help us achieve our second goal later in this post.
This section is a true/false based on whether the values in the specified column are either missing or not missing.
Our first goal in this exercise is to Split the data table so that all Trucks are in their own data table and all other vehicle types are in a separate data table. To achieve this we will use the top purple box in the configuration menu since we’re trying to split based on a pattern, in this case “Truck”. The steps are as follows:
We’ll send the truck data to the top output by selecting the “Include rows by attribute value” bullet, as pictured below.
Now we can click “apply” & “OK” and then execute the node. Our results in the top output of the node should now be (as pictured below) the data table limited to only the Trucks! Likewise, the bottom output of the node should be all the other data that didn’t match Type as “Truck”.
Now that we’ve achieved goal #1, we can move on to goal #2 and split the data table so all vehicles model year 2000 and newer are in their own table and then all vehicles older than 200 are in their own table as well.
To separate by model year, we will use the middle purple box of the configuration menu. The steps are as follows:
Your configuration menu should now match the following picture.
Now we can click apply & OK and execute the node. The results in the top output of the node should now show only data where the vehicle Year is 2000 or greater! Likewise, the data in the bottom output only shows vehicles with a model Year older than 2000.
That’s all there is to splitting data tables in Knime, at least the most basic form of splitting. If you’re looking for more functionality in splitting data tables aka splitting on multiple rules/conditions/criteria, then stay tuned for the next post where we’ll go into just that! I’ll link that post here once it’s ready.
Let me know if you have any questions or need anything clarified. You can shoot me a DM on Twitter @cest_nick ; otherwise, I’ll catch you on the next one!
-Nick