Good evening data friends. My name is Nick Rivera and I’m a Business Analyst by trade. I’ve been blessed enough to work on projects that range the entire spectrum of what a “business analyst” could do:
• Data wrangling
• Data cleaning
I’m lucky to have worked (and to be working) with leaders that recognize the value in data and what data can bring to business when studied, reflected upon, and used as an aid in business decision-making. These leaders have also recognized that the proper data tools can increase the productivity and value-creation of data folks. I have been fortunate enough to get access to a variety of data tools that aid my daily work:
• Microsoft SQL Server / Toad for DB2
Of all the different tools I have used, none have impacted my work & productivity like Knime have. That’s the catalyst to this story. I’m going to tell you more about Knime and why I love to use it in my daily work. By the end of this story, if you’re a business analyst, You will be a Knime convert. And if you’re not, then I will be utterly disappointed in myself. If you’re one of my friends that more business-centric than you are techinically inclined, my hope is that my use-case / example will get you thinking about how knime can be used in situations that are more dear to your business.
It’s not a groundbreaking statement to say that Excel is the standard when it comes to business and number crunching. It’s easy enough to learn the basics in a day and it’s just as flexible as it is easy to use. You can tell excel to do this to that cell and then do that to the cell over there and then add the two together and give me an outcome. You can even whip up some lines of VBA to get a more complex order of operations going. This post is not intended to be a knock-on Excel, it certainly still holds a place in my tech stack. It’s still the front-side of many of the models I’ve been involved in building. Aside from the ease of use and the flexibility that Excel offers, I believe the reason Excel is the gold-standard business calculator is because of it’s widespread adoption (which is a product of the aforementioned). Excel is used by the most entry-level analyst to the top of the food chain, the key decision makers in organizations. Like many fresh grads, my story started there – using excel to create reporting and crunch numbers.
Aside from critical thinking & reasoning, I believe SQL is the most fundamental skill a business analyst can have. Luckily, my first data-boss seemed to believe the same as he was the one who put me to the task to learn some basic SQL. I didn’t think much of it at first, “I just need to learn it because it helps me grab the data I need for my reports.” It was that period of learning SQL that really taught me how data sits in relational tables.
SQL syntax is almost the opposite of Excel, it has strict rules that must be followed for your data to shake out how ever you’d like. You can’t just tell SQL go to that cell and then this cell, everything is a table and you must follow the rules when manipulating the tables; otherwise, your syntax will error out and your analysis stalls at step 1. Following these rules forces you to understand tables and their relationships. Once you’ve got the understanding, there is nothing that can hold you back from bringing together any set of tables you need to perform an analysis.
This understanding of tables, how to grab the most detailed level of data and work it any which way up or sideways, helped me piece together a more complete background in data. At this point, with Excel and SQL (and access to the tables I wanted) I was cooking with grease.
I had the ability to use data in more fashions than simply reporting – in other words, I was no longer limited to creating reports that reflected on past performance. With the proper access I could pull any data I needed for an analysis and I could help my boss by running deep into different data rabbit-holes. “What did we find here?” If anything, “how can we use it go-forward to aid in business decision-making.” These were the type of projects my skills had me working on.
As I worked through more and more of these projects I learned some of the frustrations of using Excel for purposes outside its core scope. My biggest frustration – working with large data sets seemed to be the Achilles heel of Excel. First, Excel would take forever to run a calculation and soon after it would just shut-down. The bigger and more detailed the data set you’re working with, the less complex calculations you can run with Excel before it eventually dies out and hits you with a “Recover latest version” message. This issue was holding me back, like shifting into first with the e-brake up.
Being how I am, I would try push through. Sometimes that meant turning off auto-calculations, using more optimized formulas (index match vs vlookup), or limiting my data sets a bit. “Boss, maybe we reduce the scope of this from the last 9 weeks to the last 5?” That was not my proudest of moments, but I must own up to it!
Instead of forcing the issue, I should have reached out to other folks who possibly ran into similar problems. I should have researched the issue from a different angle – which I think I might have, but I didn’t find anything on it. I think I’ve got a decent idea on why I didn’t find an answer through my google search, but I’ll touch a bit more on that later.
I will never forget the first day I came across the Knime Analytics Platform. My boss asked that I get on a call with a co-worker from a different division within the company – we were talking through different forecasting techniques we could use in an upcoming project. My co-worker hit share screen and there it was, the first workflow I had ever seen in the Knime Analytics platform.
In all honesty, when I first saw the platform and the workflow that my co-worker walked me through, I really didn’t think much of it. “Is this just an ETL style tool?” was my only question about Knime. He said “yes, It helps with data manipulation and running different algorithms all in one go.”
My biggest take-away was on the forecasting method and caveats we talked about. I really didn’t give any more thought to Knime.
Fast forward a week later and I’m working with large data sets and once again Excel is crashing and leaving me ready to rage! During this time I was working from home, we’ll actually, I was at my gf’s house this particular day. My gf noticed my frustration and asked what was wrong. “This piece of S**t program can’t handle a lick of data! How am I going to deliver something when I can’t even run a few basics calculations across this data set?!” That’s when it hit me, I can’t try to suck the ocean through a straw. That makes no sense, I must find something else to run this work with. I was sitting there thinking, “what if I try out that Knime program? Maybe it can help with this type of work.” If it can handle forecasting, then maybe there’s a connection there?
I downloaded Knime and I was off to the races. I’m pretty sure I was yelling, “LFG!! This Knime program ran my calculations as if it was asking “that’s it? This is light work.”” I was beyond amazed. My gf asked why I was suddenly, so star gazed. “It’s this new program that I heard of about a week ago,” I answered.
Yes, this is how I explained Knime to my boss and how I continue to explain Knime in a one-word sentence whenever folks do/don’t ask me about it. Go ahead, don’t ask me about it and I will still tell you about it. I am an addict. But more seriously, that’s how highly I think of The Knime Analytics Platform.
Why Are You a Knime Addict, Nick?
Great question my friend, I’ll try to answer this as succinctly as possible.
• Processing Capability
• Data Flexibility
• Ease of Use
• Use Case Flexibility
• It’s Free to use! (for the non-server version, which is plenty powerful)
I wasn’t kidding when I said that Knime is like Excel but on Steroids. I dare you to try and run a simple vlookup, sumif, and if statement on 5 million rows of data in Excel. Excel will crash. Knime will eat those simple calculations on 5 million rows for breakfast. You can’t even have more than 1,048,576 rows of data in one Excel sheet. Knime doesn’t give a dang about row limit. There is no comparison.
• Excel files
• CSV Files
• Knime Tables
• Direct Database Connection
• your joins aren’t limited to pulling in one column at a time, like a vlookup is
• Your joins aren’t contingent on the lookup column being the left-most column of the lookup table
• You can left, inner, outer, and right join
• Data flows from left to right, so you’re not chasing formulas from one column to another or from one page to another
• You can group similar processes into components
• Nodes and Components can be subtitled for record keeping
• Annotations allow for further note taking
• Write to a Database
• Hyper files for Tableau
If you have a solid understanding of data structures (aka SQL!!), then learning Knime comes down to finding the Node that will perform the calculation or manipulation that you’re looking to achieve:
For Example, if you want to do:
• Vlookup => Joiner Node
• Sumifs => Groupby Node
• Filter Data => Rule-based Row Filter Node
• Conditional Math => Column Expressions Node
• Left / Right Functions => String Manipulation node
Once you’ve got the correct node you simply drag and drop it into your workflow and configure the node settings to your liking. That’s all there is to it.
The non-server version is completely free to download and use. You can’t beat this. You can run the most complex work and they don’t charge you a dime. You only pay if you want the premium services like the ability to completely automate a workflow or the ability to seamlessly collaborate with multiple team members on one workflow. The paid version is more-so luxury and not a necessity when working with Knime.
Ok, ok I hear you. I’m going to walk you through an example workflow that will highlight the power of this powerful platform.
A scenario you might come across if you’re an e-commerce retailer that sells through Amazon is that you’ll need to provide tracking numbers for your recent sales. Basically, you have two data sets – one from Amazon and one internal – that need to be “reconciled” and then sent back to Amazon. This seems like simple task on it’s face, but when you’re selling hundreds of products are day, then the task becomes a bit more time intensive.
A quick solve could be to delegate the task to one of your customer service agents. Although, there is the downside that you’d be tying up between an hour and two per day that your customer service agent could be using to serve customers. Below is an example of the two files we’re discussing. It’s easy to see how larger data sets become much more labor/time intensive
The above workflow does the work we’d delegate to our customer service agent. It takes both the Amazon open orders file as well as our internal company’s shipped order file and fills in the tracking number from one file to the other wherever it’s available. The workflow then outputs the filled-out Amazon sheet – dated with the relevant time stamp – so that our customer service agent can upload the tracking numbers to Amazon and keep our customers updated on their shipping. The only thing our customer service agent would have to do is place the two files in the correct folder location, click “reset” and “execute” on the workflow, and then upload the filled-out file that Knime outputs. It’s that simple.
Not only does this workflow save an hour to two a day, but this workflow also helps control for human error. Human error can appear when a tracking number is incorrectly input from one file to the other, when a tracking number is matched to the incorrect order number, or even when an order is missed and isn’t updated with a tracking number in a timely manner.
Knime helps reduce repetitive work to a simple reset & refresh of the workflow!
There’s much more, in-depth work Knime can help you accomplish, such as:
• Semi-Automating / Fully Automating Reports
• Analyzing Relationships in Data
• Clustering & Segmenting Customers/Suppliers
• Aggregating Excel Data Files
• Modeling Business Operations
I’ll be writing about these in the future and will link the posts once I’ve got them published on my site. Stay tuned for these!
Hopefully this verbose (I’d say elegant) post has convinced you to look further into Knime as a tool to add to your arsenal (London is Blue). At the very least, this post has you asking yourself, is there an application where this can bring value to my business? If that’s where you’re at, then don’t worry – I can help you find the areas of opportunity in your business where The Knime Analytics Platform can slot in as the perfect solution.
That’s the end of this post my data friends. I hope I’ve enlightened at least one of you today. Don’t forget, you can always reach out if you need to, just shoot me a DM through Twitter @Cest_Nick
Catch you on the next post!