This is a pretty big nut I’m cracking today. There are entire books dedicated to the topic. So, as always, I’d like to start with some simple terminology that keeps us on the same page without requiring a Ph.D. in data science. If you disagree with the terminology, let’s talk.  I always appreciate discussions in the thread below. But, of course, you are falling into my secret marketing trap, so beware!
Now, on to the terminology:
- Baseline – Take a measure or Key Performance Indicator and compare it to your internal performance.
- Benchmark – Take a measure or Key Performance Indicator and compare it externally.
Ok, so what do I think in general about benchmarking and baselining? Well, baselining is always necessary for performance management and benchmarking is sometimes helpful, but only when you know the limitations.
So, for you inductive thinkers out there, I have an example that I think is pretty helpful in the world of IT Management:
- Good Measure (and KPI, more on that later) for benchmarking in the world of IT management — % of Incidents Resolved by Tier 1
- Poor Measure (and KPI) for benchmarking — % of Incidents Meeting Service Level Agreement
Why is % of Incidents Meeting SLAs Poor? And what makes certain measures poor for benchmarking in general?
I’d like to use this to introduce the top 3 issues associated with benchmarking:
- Variability
- Aggregation
- Data Availability
So, the answer to the previous question highlights issue #1 with benchmarking: variability across organizations. Service Level Agreements are generally agreed upon based on the needs of the business users balanced against the cost of delivering. So that SLAs should be and are different in most organizations. While we’ve seen that most (not all) large organizations have priority-based SLAs, they have different response time targets, resolution time targets, business hour definitions and so forth. Also, a smaller, but growing percentage of Global 2000 organizations are moving toward Service Level Agreements that are truly based on business and technical services, thus improving visibility and communication up the chain to executive management.
Now, with % of Incidents Resolved by Front Line, most organizations do have a 1st tier support organization. That is universal. And, it is also universal that an Incident resolved at the 1st tier is less expensive than in deeper tiers (such as engineering or development)
Next issue with benchmarking (and issue #2) is aggregation. What you will inevitably find with benchmarking solutions is that they are presenting you with aggregate level information with no ability to drill-down deeply (due to the volume of data and other factors like anonymity requirements) to the grain in order to see the level of detail that is necessary for analytical use cases. Example: Average resolve time (from our database) is roughly 130 hours for organizations with > 150 people in IT. But, when you drill-through to the next level of detail you will see that the average resolution time is 10 hours for desktop support incidents and 200+ for Tier 3 incidents. To drill even further, you would see that average resolution time for all Cisco core router incidents is 360 hours. SAP Application issues are 280 (depending on the module). And, if you drill-down to the Incident, you will see some Incidents are 1,000+ hours and others are shorter (that’s why median can be very helpful in fact). Now, let’s say we were to bundle together hundreds of different customers together and categorize them by vertical, segment and IT organization size. Do you think that all categories, subcategories, product names and assignment group names are the same across all companies? Or that we would have permission to share Incident level detail (with assignee names) in that public database even if we could?
The third point and by no means final is the issue of data availability. This issue is particularly acute when looking at benchmarking solutions that rely on survey samples or are pulling the data from only 1 type of application or data source. For example, if you are going to get accurate costing data for an IT staff member, you would potentially need to pull data from the financial system, phone system, email system and ITSM system at a minimum. Or, how about to pull together the total cost of an IT asset?
Benchmarking from my personal experience — QuickBooks Online (QBO, SaaS version of QuickBooks) offers a benchmarking & analytics product called Fathom (used to be free) that allows you to compare your performance against your industry peers. So, at Northcraft, with the free version, we were able to see that our revenue growth, margins, and cash flow operating ratio were in the top quartile of all companies in the “software†category. Yay for us. And, this was a pretty good use of benchmarking because these measures are generally accepted and measured in a similar way across most different organizations due to generally accepted accounting principles (ITIL for accountants). However, if I want to measure cost per highly qualified lead, which helps us make necessary tactical decisions around our marketing budget, I’ve found that the fathom numbers are completely inaccurate and of no use. Why? Because I need to combine data from our marketing system, sales system (salesforce.com), QBO and Google Analytics. Would Fathom take me across all of those data sets? No. And, it never will. Embedded BI is different. This is where a BI platform becomes necessary for evidence-based improvement.
So, here’s where the rubber meets the road. For your Key Performance Indicator Goals, should you baseline or benchmark? The default solution is to baseline your own performance and look to improve from there. Improvement is improvement regardless of what your peers are doing. That being said, it probably wouldn’t hurt to take a few common industry measures from offerings like MetricNet as long as the cost of the information isn’t too high!
Happy analyzing!