How to do an Excel Vlookup in Knime

June 05, 2021

Vlookup in Knime

The Vlookup is one of the best and most fundamental formulas available in Excel, and for good reason too. The ability to join (combine) related tables is crucial when trying to bring together data from multiple source, in order to synchronize and deliver a succinct analysis. Worry no more, today I will show you how to perform a vlookup from Excel in Knime.

Enter the Join

If you’ve ever written a SQL query before, then you already know what a vlookup is. If you haven’t, then today you will learn!

The Vlookup in Excel is made up of 4 arguments:
  1. The Lookup Value
  2. The Lookup Range
  3. The index of the column you want bring over
  4. Exact or approximate match
To perform a Vlookup in Knime you will have to use a joiner knode.

The joiner node only needs 2 “arguments”, the lookup column(s) and the columns you want to bring over.

For our example, lets use the two tables pictured below:

vlookup in knime

Our goal will be to vlookup the Department Bonus Rate detail from Table 1 to Table 2.

We’ll vlookup (join) the bonus rate from the above table to this new table

To perform a vlookup we’ll need to select the Joiner node. In your node repository, search for “joiner” and then drag and drop the node into your workspace.. Connect both tables that you want to vlookup (aka join) to the Joiner node. The table that you connect to the top input will be considered the “left” side table, while the table you connect to the bottom input will be considered the “right” side table. This is important to recognize because the Joiner node will give you the flexibility to run a left join or a right join as well as an inner join or a full outer join. I’ll add more color on the join types a little later, but just keep that in the back of your head for now.

Now that you’ve got the two tables connected to the Joiner node, double click into the Joiner node to start the configuration. In the configuration you will see the following tabs.

The most important tabs here are the “Joiner Settings” and “Column Selection” tabs. The Joiner Settings tab is where we will set up the “lookup value” argument while the column selection tab is where we’ll set up the “lookup column index” argument.

In the Joiner Settings tab we’ll need to outline the column on which we want to join the two tables. You can think of this step as the providing the Lookup value argument and the beginning of the lookup range. Put differently, the lookup value and the first column of the lookup range are the values that we need to provide as the joining columns.

Note the Join Mode & Add Row Functionality of the Knime Joiner Node

In the screen shot above, in the red box there is a button that reads “Add row”. We’ll click this to give us a row where we can outline the joining columns in each of the two tables. The below image shows the configuration menu after we click Add Row. We’ll click into the dropdowns that are labeled “Row ID” and then we’ll set these both to the joining columns. The dropdown on the Left side corresponds to the left table aka the Top Input of the Joiner Node while the dropdown on the right side corresponds to the Right table aka the Bottom Input of the Joiner Node.

In our example, the two tables (both left and right side) share the Department column, so we will select “Department” under both. Now that we’ve got the joining columns delineated, we need to select the type of join or vlookup we want to run. The options under Join Mode are as follows:

  1. Inner Join: The final output here is a new table where both the left & right side tables share similar matching rows
  2. Left Join: Here you’re joining details from the right side table to the LEFT side table – think of the LEFT side table as your primary table
  3. Right Join: Here you’re joining details from the left side table to the RIGHT side table – think of the RIGHT side table as your primary table
  4. Full Outer Join: Here you’re joining both tables against each other, regardless of their being matching lookup columns or not; therefore, some columns/rows will show null/missing values

What’s glaringly obvious here is the flexibility available within the joiner node that isn’t really available in an excel vlookup. We wont get off track fully outlining that detail here though, since the goal of this post is to be a quick DIY of sorts. I will write up a separate post showing the pros of the join over a vlookup, I’ll link that post here once it’s ready, so be on the lookout.

Our original goal was to join/lookup the Departmental Bonus rate from table 1 over to table 2. Given that goal, we’re going to run with a right join. The thought here is that the table on the table on the right side is our primary table of analysis, so we’re joining details [from the Left Table] to the Right Table. Let me know if you don’t follow that!

Now that we’ve got the joining columns sorted, we can move on to selecting the columns we want to bring over aka argument 3 of a vlookup. In the screenshot below is the column selection tab. The configuration here is pretty straightforward, we’ll select which columns from which table we want to include & exclude from our output table.

Knime’s Joiner Node Allow Flexibility in column selection from both left and right tables

The tables we’re joining are small and limited in column count, so we don’t really need to limit which columns we want bring over from the left or the right table. There may be cases though, where you’re joining detail from two tables which each hold multiple columns of different detail. You might not need every single detail from both tables, so in those cases you have the ability to limit which columns come over from each table. To include and/or exclude columns that we want in the joiner output, simply uncheck the box under each Include screen that reads “Always include all columns”. Unchecking this will allow us to free move columns from the Include side to the Exclude side and vice-versa. Again, this is another case of added flexibility that a join offers which an excel vlookup doesn’t offer!

The “Always include all columns” checked box is a setting that’s checked in by default. Another default setting you’ve got to be aware of here is that the joining columns from the lower table are dropped in favor of the columns from the left table. This isn’t really a huge issue in most cases, but it’s something to keep in mind when you get more involved with varying tables. You can simply override this default setting by unchecking the box pictured below. If you uncheck this box and also leave the one above it unchecked, then you will get duplicates of the joining columns. To the left of the red box in the screenshot below are a few options on handling the naming of the duplicate columns. I like to use the “Append custom suffix” option in cases where I’m looking to calculate Product Mix or Market Share values. You can read more on that in this link.

After we’ve selected the columns we want in our final output, we can hit apply, ok, and execute the node. The results look like this:

This table has the output of performing an excel vlookup in knime!

Table 2 now has the Departmental Bonus Rate detail from Table 1. Since we didn’t exclude any columns, we also brought over the Employee Count column from the left table over to the right table – you’ve got to love that flexibility!

I hope you found this write-up useful! Let me know if you have any questions or anything seems unclear. You can reach me through twitter and DM me, my handle/username is @cest_nick

Cheers!

Book a call to explore the different avenues the Analytic Iron team can help You extract insights from Your Data.
BOOK a Discovery cALL