Splitting a Data Table Based on a Condition/Criteria in Knime

September 13, 2022

Splitting Data Tables in Knime based on a single Condition/Criteria

Today we’re going to walk through how to split a data table based on meeting a condition within a column. We’ll be working with same data from our post on filtering data, which you can find at this link.

Our two goals for today are to split the data table based on the below:

  1. Split the Data table so that Trucks are in their own data table, apart from all other vehicle types
  2. Split the Data table so all vehicles model year 2000 and newer are in their own table

To get started we’re going to search in the Node Repository for the Row Splitter node.

Search for the Row Splitter node in the Knime Node Repository

Once you’ve got the node onto the workspace and connected to the data table, the next step is to double click into the node so that we can begin the configuration. Below is a screen shot of the configuration menu you’ll see once you double click into the Row Splitter node.

Configuring the Row Splitter Node

What you’ll see here are a few different options offered by the node. In the red box at the top is where we select the column where we want to match the criteria that will split our table into two outputs. The box in yellow, to the left has the filtering options where we tell Knime whether we want the matching rows output in the top output of the node or if we want the output in the bottom output of the node, see my artistic graphic below. The Boxes in purple are where we select the type of rule we want to split the tables on – be it string matching, using a numerical range, or specifically targeting matching missing values. I’ll explain the purple boxes in a bit more detail below.

Top vs Bottom Output of the Row Splitter Knode

The 3 purple boxes are where we select the type of rule we want to split the table on, below is a bit more detail:

  • Top Box – Pattern Matching

Pattern matching can be used for splitting data tables based on string/text matching. If you want to split a table based on a particular text condition, then this is the section you will use to delineate that. You simply fill in the String/Text you want to match/split on in the blank input box thats directly beneath the “use pattern matching” bullet. You’ll see this at work when we accomplish goal 1 a little later in this post.

  • Middle Box – Range Checking

This section is where we delineate numerical values that we want to split the table on. The flexibility is already built in to allow for splitting for Between values, in other words splitting by a range, because the menu offers and input for a lower bound as well as an upper bound. If we don’t want to split by a range, all we need to do if provide a value in the lower bound input. This would split by below as true values and above as false values. Likewise, if we only provide an upper value to split on, then the values above or equal to are output as true while below are output as false. We’ll use this section to help us achieve our second goal later in this post.

  • Bottom Box – Missing Values Matching

This section is a true/false based on whether the values in the specified column are either missing or not missing.

Our first goal in this exercise is to Split the data table so that all Trucks are in their own data table and all other vehicle types are in a separate data table. To achieve this we will use the top purple box in the configuration menu since we’re trying to split based on a pattern, in this case “Truck”. The steps are as follows:

  1. Set the Column to test: field to the column “Type”, as that is the column we want to search for the condition we’re splitting the table on
  2. Type “Truck” into the blank box under the “use pattern matching” bullet
  3. Select whether you want the matching data (aka the Truck data) to output in the top output or bottom output of the node

We’ll send the truck data to the top output by selecting the “Include rows by attribute value” bullet, as pictured below.

Now we can click “apply” & “OK” and then execute the node. Our results in the top output of the node should now be (as pictured below) the data table limited to only the Trucks! Likewise, the bottom output of the node should be all the other data that didn’t match Type as “Truck”.

Now that we’ve achieved goal #1, we can move on to goal #2 and split the data table so all vehicles model year 2000 and newer are in their own table and then all vehicles older than 200 are in their own table as well.

To separate by model year, we will use the middle purple box of the configuration menu. The steps are as follows:

  1. Select the “Use range checking bullet”
  2. Set the Column to test field to the Year column
  3. Type 2000 into the “lower bound” input box
  4. Select whether you want the matching data to output in the top output or bottom output of the node

Your configuration menu should now match the following picture.

Now we can click apply & OK and execute the node. The results in the top output of the node should now show only data where the vehicle Year is 2000 or greater! Likewise, the data in the bottom output only shows vehicles with a model Year older than 2000.

That’s all there is to splitting data tables in Knime, at least the most basic form of splitting. If you’re looking for more functionality in splitting data tables aka splitting on multiple rules/conditions/criteria, then stay tuned for the next post where we’ll go into just that! I’ll link that post here once it’s ready.

Let me know if you have any questions or need anything clarified. You can shoot me a DM on Twitter @cest_nick ; otherwise, I’ll catch you on the next one!

-Nick

Book a call to explore the different avenues the Analytic Iron team can help You extract insights from Your Data.
BOOK a Discovery cALL