What metrics do the experts use to measure UX effectiveness?
SUS? WUS? CUS? TPI? Come see how many of these abbreviations I invented myself in this investigation into how the experts measure UX.
It’s all well and good us sitting around in our ivory tower (technically it’s an old bus factory) yelling how great UX testing is out of the window to passers-by, but this can only get us so far.
Occasionally some of our neighbours do look up and yell back, “yeah I know! It just makes common sense to make design decisions based on actual human behaviour” but then they often make the following point…
“But how can I measure that? If I run user testing and make a change to my website that presumably improves the user experience based on our observations, how do I really know the change has worked? What UX metrics can I use to measure my success? How do I prove to my boss that the investment is worth it?”
It’s normally around the ‘metrics’ mark where we start to close the window and mumble something about “having to keep it shut because of the air con, sorry I can’t hear you.”
The trouble with UX metrics
Metrics are a difficult discussion when it comes to measuring the success, failure of shrugging indifference of your UX. Every other discipline has it made!
- You want to measure how well your blog post did: look at your traffic, see the time on page, notice how many times it has been shared, judge the quantity and quality of comments.
- You want to measure your social channels: look how many followers you have. Is there growth? Are they influential in your niche? Do they comment? Do they share? Are they entirely growth-hacker bots?
- You want to measure the quality of your Bakewell Tart: did I eat the whole damn thing? Probably, but that’s not a testament to how good it is. I’ll often eat an entire Bakewell Tart with the same ease that I take a breath. Did I demand that you make me another one? Yes! Now that’s a quality Bakewell Tart.
- You want to measure the changes made to the order of the categories in your main menu: well it definitely looks better to you! Have you run some more user testing to see if people are still struggling with it? Perhaps traffic from the homepage to those categories has improved, but there’s no guarantee it’s because of the changes.
Pfft, as if it even makes it to a plate.
As we already know, data only shows part of the story. Google Analytics can tell you what’s happening but not why it’s happening. If you’re only going by analytics, you’re essentially guessing. Sure, it can be an educated, highly informed guess – but you won’t know exactly why things are happening on your site until you see real people using it.
And this makes creating metrics around UX improvements difficult. There are other issues too, as UX designer Peter Hornsby points out:
“Part of the issue is that UX effectiveness is hard to measure. It’s a good plan to have UX-specific requirements in a project, but often these can be something that doesn’t necessarily correlate with UX effectiveness. For instance, if someone says ‘We want users to use our app more!’ then making the UX less effective will meet that metric: make it more difficult to do stuff, users take longer to do it, and spend longer on the app. UX metrics can therefore be ‘gamed’ like any other metric.”
But there are ways to measure UX. Many of them are better than others, many of them will only be very specific to a certain company or industry – but they’re out there.
What UX metrics do the experts use?
To help with my investigation, I took to the mean streets of social media, to ask our cadre of experts their opinions on measuring UX.
For the purposes of this section, I’m going to pull out the metrics mentioned by our respondents and go into more detail as we go along…
Okay let’s break the first half of Georgia Rakusen’s list down a little bit:
Helpful if there’s a specific thing triggered by a UX improvement. Say for instance a web-form completion, newsletter sign-up or some other task completion. If the site change directly impacts how many people are converting in that specific task, and you can measure that accurately, then you can be *fairly* confident you made an impact.
Just remember that having a higher conversion count may also be a result of marketing efforts, so be sure to measure the conversion rate (typically Number of Sales / Number of Visits).
As nngroup suggests:
“The conversion rate measures what happens once people are at your website. Thus it’s greatly impacted by the design and it’s a key parameter to track for assessing whether your UX strategy is working.”
And because we like to argue both sides of the… uh… argument… here’s ecommerce whiz kid Dan Barker on why you shouldn’t necessarily trust conversion rate as the solution to all your problems. Remember that not all visitors to your webpage have the potential to convert, or that conversion rates vary wildly based on visitor type.
AOV means average order value, and this is simply your total revenue / number of checkouts. According to VWO this is a “direct indicator of what’s happening on the profits front.” If your UX efforts directly tie into increasing cross-selling or upselling, then AOV can be an indicator of whether you’ve improved things or not.
Watch me go this whole section without saying how I’m going to ‘suss this out’. You’ll be so proud of me…
SUS (System Usability Scale) was also mentioned by Chris Compston, a senior consultant at ThoughtWorks:
So this ‘holistic barometer of usability effectiveness’ and ‘useful measure for complex workflows and tools’ – what does it do?
Handily enough, we have our own version of SUS. We call it WUS. Or ‘Website Usability Scale’, and it’s based on the same principles.
For every website usability test our clients run, our users complete a short questionnaire and we calculate the WUS score from that. It’s on a Likert scale, which helps us ascribe a quantitive value to qualitative opinions.
These are the questions we ask, which are responded to by clicking on an option from strongly agree to strongly disagree:
- I think that I would like to use this website frequently
- I found the website unnecessarily complex
- I thought the website was easy to use
- I think that I would need the support of somebody technical to be able to use this website
- I found the various functions in the website were well integrated
- I thought there was too much inconsistency on this website
- I imagine that most people would learn to use this website very quickly
- I found the website very awkward to use
- I felt very confident using the website
- I needed to learn a lot of things before I could get going with this website
For our clients, this can also help them prioritise which test videos to watch first, as the ones with the lowest scores will probably reveal the biggest problems.
The benefits of this measurement is that it’s very easy to administer, can be used on a small sample size and it can clearly indicate whether a ‘system’ has improved or not.
There are important things to keep in mind though when using SUS or WUS. According to Usability.gov:
- The scoring system is complex
- As the scores are on a scale of 0-100, there’s a temptation to interpret them as percentages – they’re not. Don’t do this. It’s wrong. Here’s what actually happens to the scores…
- The participant’s scores for each question are converted to a new number, added together and then multiplied by 2.5 to convert the original scores of 0-40 to 0-100. Though the scores are 0-100, these are not percentages and should be considered only in terms of their percentile ranking. Based on research, a SUS score above a 68 would be considered above average and anything below 68 is below average, however the best way to interpret your results involves “normalizing” the scores to produce a percentile ranking #MathsFun
- SUS or WUS won’t tell you what’s wrong with the site – it merely classifies its ease of use, which is fine for the purpose of measuring the improvements of a specific feature or journey. User testing will tell you how to improve.
Georgia Rakusen also pointed us towards TPI (Task Performance Indictator) as potential UX metric, but then heavily caveated (read as ranted) that it is NOT a suitable method.
But first: what is TPI?
Gerry McGovern gives an extensive breakdown of the method his team developed, “to measure the impact of changes on customer experience.” With TPI you ask 10-12 ‘task questions’ that are created especially for the ‘top tasks’ you want to test (these will need to be repeatable, as they’ll be asked again when running the test again in 6 – 12 months time).
The number of test participants is between 13 – 18, larger than you would normally find in user testing, as McGovern believes that “reliable and stable patterns aren’t apparent” until you’re testing with this many participants and that results stabilize to leave a “reliable baseline TPI metric.”
For each task, the user is presented with a task question via live chat. Once they have completed the task, they answer to the question. The user is then asked how confident they are in their answer.
TPI takes into account:
- Target Time: how long it should take to complete the task under best practice conditions.
- Time out: the person takes longer than 5 minutes.
- Confidence: At the end of each task, people are asked how confident they are.
- Minor wrong: the person is unsure; their answer is almost correct
- Disaster: the person has high confidence, but the wrong result
- Gives up: the person gives up on the task.
The theory is that if a task has a TPI score of 40 (out of 100), it has major issues. If you measure again in six months and nothing has been changed, the score should again result in a TPI of 40.
Georgia Rakusen has her reservations about this measurement:
“If your company is bought into qualitative research, you don’t need a number to prove your design. [With TPI] every session needs to be moderated, so often [which is] expensive and time-consuming. If a user doesn’t speak, you don’t understand the why, this leads to doubling up on research efforts to capture quant and qual. This doesn’t work for ‘Top Tasks’, which involve users making creative choices, as time on task becomes skewed. Test with 15 users and get one score. Test with another 15 users and get a different score. See the problem here? Pushing something live to an AB test and measuring success at high volumes will give us much greater certainty. FIN.”
Tree-testing and card sorting
Watch me go this whole section without saying how I’m going to lean on a sycamore and say, “yeah it’s very woody.” You’ll be so proud of me…
Here’s Ed Williamson, Web Content Manager at AAT:
Card sorting and tree-testing are great ways to regularly test your users and gain feedback from people to see if they can navigate your website, revealing the effectiveness, or lack thereof, of its organisation, structure and labels.
Here’s a handy refresher on both disciplines from our own UX researcher, Hazel Ho – as featured in her guide to navigational UX.
Users are presented with a list of items (for example, all the products in a grocery store) and asked to group them in a way that makes sense to them. Users can also choose names for the groups they’ve put together. This exercise can help build the structure for your website and category labelling – both things that will help with ease in navigation.
Commonly seen as reverse card sorting, this involves having main topics and subtopics already set up. You then ask users to find a certain item by clicking through headings and subtopics, until they see a link under which they expect it to reside. This exercise helps identify navigation issues because you can assess where users would expect to find information, and change your website accordingly.
But how can you use this as a metric?
As Ed Williamson said, this is not only an exercise to be done before launch, but something to be regularly measured as goals after launch.
You could assign some kind of quantitive value to the qualitative data you’ve observed in the tests (using one of the many rating systems above) to set a benchmark, and see if you improve upon it in further iterations. I guess this would be easier with tree-testing, as you have specific tasks to complete that you can measure the ease and time taken to complete.
*There is currently an interesting discussion about this subject happening in our UXChat channel on Slack, so please join in to hear what the experts have to say*
I haven’t even remotely covered every possible UX metric here, because frankly that would take all week, and mainly because I’d like you to add your own opinions in the comments below.
What I am discovering however is that UXers have a broad range of measurements to rely on, that blend both user rating systems (SUS, TPI, NPS) with qualitative feedback from user testing.
I asked for responses on our Slack community channel (which you should totally join btw) and here Peter Hornsby revealed his approach:
“It depends on the system, but generally a mix: [I use] feedback that people in contact with users receive (e.g. client support teams), direct interaction with users (large-scale feedback like surveys, with questions informed by qualitative stuff like interviews), and of course direct user journey research like WhatUsersDo.”
It also depends on your own company goals, and what results your various stakeholders wish to see. On that note, let’s end with this important point to remember from Peter:
“The key for me is being clear on what is being measured and why – and challenging this if it’s not meaningful.”
Now to clear my throat, throw open the window and start bothering the neighbours again.
Get a newsletter that isn’t all about us…
Subscribe for weekly, hand-picked articles about UX, design, and more every Friday—from the Be Good to Your Users blog and the rest of the whole darn web.
…get a little taste right here.
Main image by Igor Ovsyannykov.
He used to be the deputy editor of Econsultancy and editor of the third or forth most popular search engine marketing related website in (some parts of) EMEA.