Site speed – does it *really* drive revenue?

The topic of site speed and revenue came up among my colleagues recently, and while I soooooo much want to believe that fractional second site speed improvements drive revenue, I have yet to find a convincing set of data that proves this hypothesis. Marissa Mayer’s famous talk at Web 2.0 in November 2006 was cited as evidence in support of this hypothesis, but I believe the Google experiment was flawed (or at least not well-reported by Marissa) because they didn’t understand the why behind their numbers. She claims that “traffic and revenue went down 20%” when they showed 30 results instead of 10, which took 0.9 seconds instead of 0.4 to load. However, if  “traffic” is a function of “search page views” and “revenue” is a function of “clicks to Sponsored results,” you could explain the difference by the following two very likely scenarios:

  1. With 10 results per page, more users are clicking on the “Next” pagination link when they don’t find anything relevant on the first page. This results in more traffic, because you get more search results page views per user.
  2. Other users might who don’t click on the “Next” link might get to the bottom of the page after 10 results and find they didn’t find anything relevant, so they might click on Sponsored results with greater frequency than those with 30 results per page.  This results in more revenue.

But are the 10-per-page users more satisfied than the 30-per-page users? Not necessarily. Those who got 30 results per page had to wait 0.5 seconds longer, but you could argue (or better yet, observe directly) that more of them found a natural search result to click on the first page than those who got just 10.  (This talk was one of those that did a lot more damage than it did good, IMO.  Even AJAX got bashed.)

When I was at eBay, we (User Research and Engineering) tried really hard to show a relationship between site speed and revenue at eBay. We just couldn’t do it. I’m not saying it’s not true, but I found that it’s really hard to do. In contrast, it’s really easy to get duped into thinking something’s there when it’s not. That’s the problem with quantitative behavioral metrics when taken in isolation, or worse, when the metrics you’re using (revenue) are in direct conflict with other metrics you’re trying to improve (satisfaction or usability).

Another eBay example is the metric of “time on site” as a measure of “engagement.”  eBay blows everyone else out of the water on this one, but if you stop to think about it (or just study its users), it’s easy to see why:  it offers a unique inventory that is interesting in and of itself (which is, admittedly, engaging), but it’s also the hardest e-commerce site to use, requiring more time to figure out.  Furthermore, when you get outbid, you have to start all over again.  It also has some really hard-core, loyal users, some of whom are literally addicted to the site.

It’s really great to have access to quantitative behavioral metrics and to know what kind of impact a change to your site will have on revenue.  But it’s also dangerous to only rely on this as your only source of insight.  It’s even riskier to exalt a particular metric when you don’t fully understand its relationship to metrics that *really* drive your revenue.  That’s a form of blindness that’s hard to see in yourself.

On a related note, UIE published a study on ‘perceived site speed’ and ‘actual site speed’ about 8 years ago. It was an observational study of e-commerce sites, and they used big enough numbers to run some basic statistics. They found no correlation between ‘perceived site speed’ and ‘actual site speed.’  Interestingly, the only thing they found that was related to perceived site speed was ‘observed ease of use’ (i.e., usability): the site (in this case, Amazon) that was the easiest to use was perceived to be the fastest, when in fact, it was the slowest of all of them in the study.*

Go figure.

*Note: The UIE study criticized Jakob Nielsen’s call for faster site speed, saying that usability is more important. I think Jakob would agree, actually.  But that doesn’t mean we should build slow sites.

Presenting at BayCHI: Christian’s Greatest Hits

A few hours ago, on Tuesday, January 13th, 2009, I had the privilege of presenting as the final speaker at the BayCHI monthly event at the Palo Alto Research Center (PARC) auditorium.  The topic?  Christian’s Greatest Hits from the past 10 years.

Well, they are not necessarily my hits – just important work and insights in the field of user experience research that I think are really important for any practitioner to know and understand.  In this talk, I covered the following:

  • The User Research Landscape
  • Qualitative Validity
  • User Research Classes
  • Desirability
  • True Intent Studies
  • User Experience and Strategy

A PDF version of the presentation can be found here (also listed in the Publications sections of this site).

What was great about the experience for me was having everyone come up afterward and, after saying some nice things, provide some really useful suggestions.  Among the comments I got, here are a few notable ones:

  • In the Landscape slide (number 20), the term “data mining” is not strictly a behavioral method (could be applied to any type of data set.  (Nice catch, Garett.)
  • The term “Utility” doesn’t necessarily equate to “meeting needs”.  What’s at the core of user experience (see slide 36) needs to be better fleshed out.  Some suggestions I got:  Utility (as is);  Value;  Usefulness.
  • The term “Attitudinal” might be better replaced “Self-Reported” as one end of the data source dimension (see slide 20).
  • Culture isn’t really dealt with in this presentation.

The wide range of meaning behind the term “Desirability” (see slides 44-46) suggests that we, as an industry, need to lock down what we mean or choose another term, like Enjoyability, Engagement, Emotional connection, Addiction (yes, this was a suggestion), Aesthetics, and the like.  In this talk, I was able to describe the usage and research of Desirability from the past and then point out how variable we are when using the term.

I will be presenting a version of this talk in a Webinar on January 29, 2009, published by Rosenfeld Media.  Should be interesting.

Desirability Studies: Measuring Aesthetic Response to Visual Designs

Introduction

Many people have been asking me to say more about “Desirability Studies,” which I recently described in Jakob Nielsen’s Alertbox article as a method “to measure aesthetic appeal.”  Desirability studies actually do more than just measure, as they can also be used to inform and even inspire different visual design directions you may be considering.   In the landscape of user research methods I described in this article, it is classified as an attitudinal study that can be qualitative or quantitative (shown below as a “hybrid” method in the middle bottom area):

The problem of subjectivity

Desirability studies are far less well known, despite how important visual design is to user interfaces.   Paul Howe wrote in and said, “I’ve done a lot of testing for aesthetic preferences but it usually feels like the least scientific part of all my user research.”  This “less scientific” feeling to studying visual design is due to several causes.  It is partly due to the way visual design itself is approached, often coming from the designer’s beliefs about what would best evoke the desired response.  However, I would suggest that the approach in creating the design direction is not the only or even biggest issue – it’s the presentation of the design direction, which is often done in a largely subjective manner.  For example, when presenting different design directions to a decision-maker, a visual designer might say something like:

“I recommend design C over A and B, because I feel it evokes the right kind of emotional response in our audience that is closer to our most important brand attributes, trust and fun.”

The problem with this type of presentation is that it frames the decision as a subjective interpretation of the design by internal constituents, and a decision-maker might feel his or her interpretation or gut feeling for the design is just as valid as the designer’s.  And it is, at least in the sense that they are both just single individuals.  Even though the designer ostensibly has years of experience knowing what types of designs evoke which types of responses, the authority of the decision-maker equals things out and, if there isn’t agreement, you often end up in a kind-of stalemate that doesn’t move things forward in a positive direction.

Two problems that Desirability studies could help solve here would be:

  1. To inform the design team as to why different design directions evoke certain responses in the target audience (in order to refine the direction); and
  2. To precisely measure visual design directions against specific adjectives (such as brand attributes) to help make a final decision.

Doing this puts the subjectivity where it belongs:  as the voice of how the target audience feels about the design, not the designer or decision-maker.  This empowers the designer and decision-maker to make an informed choice.

Another question is, “How can we predict real world behavior?”  The answer is that we can’t do this with Desirability studies, since they are a measure of attitude rather than behavior (and we all know that what people say and what they do are often two very different things).  You’re never going to get a good read on predicting real-world behavior from an inherently attitudinal study, but then Desirability is more concerned with the initial stages of the interaction with the product, rather than ongoing ones.  The latter are more influenced by solid interaction design and ultimately, meeting an underlying user need.  Positive (and subjective) aesthetic responses help get your target audience started using your site, but assuming you haven’t made egregiously bad choices (e.g., poor contrast or small targets), it probably won’t affect overall usability that much.

Types of Desirability Studies

There are two general classes of desirability studies: Qualitative and Quantitative.  In the qualitative version, participants are brought into a lab or conference room individually and shown different visual design directions (e.g., mood boards or high-level designs) or visually designed interfaces (e.g., high-fidelity mockups of home pages).   Below is an example of 3 different design directions from a Yahoo! Personals desirability study conducted by Jeralyn Reese and Michelle Reamy several years ago:

Participants are then given a set of index cards that have a description written on each card (usually adjectives), and participants are then asked to indicate which card goes best with each design direction.  Below is an example of some of the descriptions on these cards:

Accessible    Desirable    Gets in the way    Patronizing    Stressful
Appealing    Easy to use    Hard to use    Personal    Time-consuming
Attractive    Efficient    High quality    Predictable    Time-saving
Busy    Empowering    Inconsistent    Relevant    Too technical
Collaborative    Exciting    Intimidating    Reliable    Trustworthy
Complex    Familiar    Inviting    Rigid    Uncontrollable
Comprehensive    Fast    Motivating    Simplistic    Unconventional
Confusing    Flexible    Not valuable    Slow    Unpredictable
Connected    Fresh    Organized    Sophisticated    Usable
Consistent    Frustrating    Overbearing    Stimulating    Useful
Customizable    Fun    Overwhelming    Straight Forward    Valuable

After the participant has selected cards that go with each design, the researcher asks them why they made the selection the way they did.  This is the main benefit of this approach:  finding out why certain designs cause certain reactions.  This provides the design team with what they need to make an improved next version.

Because the sample sizes are small and recruitment into the study likely introduced additional biases, this is not an appropriate quantitative method.  A quantitative version of desirability studies was developed a few years after the qualitative version came out.   The idea is to represent the design directions in some kind of image that is embedded into a survey.  This allows the researcher to gather larger, more representative samples of the target audience, often with results that are more generalizable.

The Yahoo! Personals desirability study mentioned earlier used a quantitative approach discussed here.  This was important, because changing the visual design was a big decision:  It was the first time the company was willing to change the design of a Yahoo! “property” (Mail, Personals, Finance) to more closely match the visual design style of its competitors, rather than continue to use the general Yahoo! design style found across the network.  (The last of the 3 designs shown above won out).

Presenting Desirability Results

Just doing the study isn’t enough, especially when there is contention.  One simple way to show the results is with a Venn diagram.  Assume we have three design directions called “Simple,” “Modern,” and “Fun.”  A summary of the results might look like this:

However, it may be important to show more methodological detail around the results, such as significance testing on the survey results.  Mike Katz, Rian Van der Merwe and Christina Hildebrand pushed the desirability study methodology further at eBay.  They used paired opposites in the survey, which allowed them to show the results like this:

In another study, they were able to show the results of significance testing in the presentation of the results:

Published Papers on Desirability Studies

Desirability studies were first discussed in a UPA 2002 paper by Joey Benedek and Trish Miner of Microsoft, entitled “Measuring Desirability:  New methods for evaluating desirability in a usability lab setting.”  Later, Microsoft returned with a case study presented at CHI 2004 of how they modified this method to include a modified focus group discussion to the qualitative method.  As far as I know, the quantitative version of desirability studies has yet to be published.

Conclusion

In summary, Desirability Studies are a great way of understanding the aesthetic and visual design directions you may be considering and ultimately measuring how much certain designs evoke certain responses relative to each other.  This is most useful when you are trying to make a good first impression with your target audience and invite them to interact more deeply with your site or product so they can discover whether it will meet an underlying need.  If all goes well with other aspects of the user experience, they will become a loyal user.

Writing for Jakob Nielsen

I found that writing an article for Jakob Nielsen’s Alertbox column on useit.com a refreshing exercise in brevity and clarity in my writing.  It was a nice departure from my natural tendency to explain every nuance in my written communication.  As I reviewed his style, I found myself cutting every thing down completely on my own, and as editor, Jakob only made major changes to the beginning of the article, where a clear setup was crucial to get readers to dive into further details. I have a lot more respect now for how much effort it takes to write in the style he does.

Future Blog Entries

I have a long list of topics to write about, many of which I’ve presented, some of which have been published.  When possible, I’ll also provide the actual paper, document or link to the talk as well.

* R&D: Research and Design Working Together
* Strategic Choices: Throwing the Design Spear or throwing the Research Spear
* Flowing revenue: Interaction Design as a short-term business lever
* Qualitative Validity
* Six Principles of Successful Online Advertising
* The Rise (and Slight Descent) Of Intrusive Online Advertising
* Visual Design Strategy
* Corporate “Science”
* Thinking Aloud: Making the most of self-reported data in a behavioral method
* Battle of the qualitative: Ethnographic Field Studies vs. Focus Groups
* “Egads, Peterme!”: Subtleties behind surveys
* Costco: Returns No More
* Social Networking: What drives growth and implosion of these strange beasts
* Getting half of the experience right: Setting customer expectations
* Corporate mal-practice: Grading on a curve
* Churn and burn: Why reorgs don’t (really) work