Evaluation+Resources


 * On Interviewing Users: To help with evaluation presentation**
 * Even though have weaknesses interviews are a valuable method for **exploratory** user research
 * **what users say and do are different, so not totally reliable**
 * only use in a few cases for which they generate valid data
 * BAD POINTS: asking people to remember past use or speculate on future use of a system...weak
 * human memory is fallible,often make up stories, can't usually respond validly to what //might do//
 * the most //valid// timeline for a user is what they are //doing right now//
 * may misremember past and or mis-predict future
 * can't make speculations about what //may// work
 * //**DO not**// ask e.g. should button be red or orange? is it better to use drop down menu or radio buttons? Is it better to have 3 levels of navigation or 2? And the like...
 * **GOOD POINTS: when NOT wanting more individual page design decisions, 'cos already done user testing, can ask what thought of whole thing after using it**
 * **for general attitudes or how think about a problem**
 * any **Critical incidents**: ask when particularly difficult or when worked well...extreme cases better recalled
 * **Beware of the** **Query Effect****: many people will make up an opinion..and will talk to great lenght about something that really is not that important, and they never would have thought about it if you had not asked!!!!**
 * **don't ask leading questions! Opinions will be made up to please you**
 * **best to** **Combine Methods: Data Triangulation****: interviews plus user testing plus questionnaire etc. More info the more cross reliability and validity**
 * @http://www.useit.com/alertbox/interviews.html
 * @http://www.useit.com/alertbox/user-research-methods.html
 * @http://www.useit.com/papers/focusgroups.html
 * @http://www.useit.com/alertbox/20010805.html


 * On keeping online surveys short**
 * **will ensure high response rates and keep results valid**
 * a low response rate, from the most committed users, can skew results due to bias,
 * does not matter what you 'learn' from a survey...if it does not represent your users
 * **average users WILL respond to a survey if it is quick and painless...so reduce number of questions**
 * **questions easy to understand**
 * **survey/questionnaire is easy to complete/operate**
 * an interactive questionnaire is a USER INTERFACE, so should be designed on basis of user testing!!!! and follow usability guidlines!!!
 * **Survey Bloat** happens when have a divers marketing group (US?) all of whom want customer feedback on their particular issues.
 * **Do not try to collect all information that anybody could ever want**
 * **So, ask fewer questions....only address your core needs and skip subtleties.**
 * Surveys are not good at measuring minor differences anyway...direct observation is needed for that
 * One question appears to be the main one you need, " How likely is it that you could recommend [X] to a friend or colleague?" In 113 of 14 case studies, this one q. was as strong a predictor of loyalty a any longer survey.
 * **Divide and Conquer**: ask different q's of different visitors., and randomly assign
 * @http://www.useit.com/alertbox/20040202.html

A bit more on questionnaires (from my Blog):

===[|Asking Users: Questionnaires - Preece, Rogers, & Sharp - ePortofolio Chunk #10] === 

March 21, 2011 - Preece, Rogers, & Sharp, Chapter 13 - focus on "Asking users: Questionnaires"

In our product evaluation, we are planning to use observation, usability testing, and a questionnaire, which, according to Preece et al. (2002) "can be used on their own or in conjunction with other methods to clarify or deepen understandings" (p. 398). Preece et al. offer a number of guidelines for designing questionnaires, many of which we have adopted, including:

 Nielsen (2004) adds to this, suggesting that in order to ensure high response rates and avoid misleading results, you should "keep your surveys short and ensure that your questions are well written and easy to answer."
 * avoiding complex multiple questions
 * making questions clear and specific
 * beginning with general questions, followed by specific questions
 * providing clear instructions
 * using consistent rating scales
 * aiming for brevity over length to encourage completion

He adds, "the highest response rates come when surveys are quick and painless. And the best way to reduce users' time and suffering is to reduce the number of questions" (2004). Another good point that Nielsen makes is that an interactive questionnaire is a user interface - which in and of itself should be "designed on the basis of user testing and follow standard usability guidelines."

One of Nielsen's points that really resonated with me is the idea of "survey bloat" - particularly when working in a group, there is perhaps a tendency to try and satisfy everyone's wants - each person coming at it from a different perspective. I imagine that a design team would be similar, which serves to reinforce the importance of knowing exactly what it is you are evaluating for - you can't evaluate everything at one time! As Nielsen notes, "please resist the temptation to collect all the information that anybody could ever want. You will end up with no information (or misleading information) instead."

Related to this is the 'Query Effect", also referred to by Nielsen (2010) - "whenever you do ask users for their opinions, watch out for the query effect: People can make up an opinion about anything, and they'll do so if asked". The lesson for our group? Be careful what we ask - make sure it is information that we want to have and that matters in our design.

Conduct evaluation based on:
 * //**usability goals** (i.e., effective to use, efficient to use, safe to use, has good utility, easy to learn, **easy to remember, usability**) and //
 * //**user experience goals** (**satisfying**, enjoyable, fun, entertaining, **helpful**, motivating, aesthetically pleasing, supportive of creativity, rewarding, emotionally fulfilling) //

From our Presentation Design PowerPoint:  * **Usability testing**  * Parallel design: test multiple design prototypes   * Dynamic animated – graphics, voice, text * Still images with text   * Test efficacy of learning with Virti-Cue mock-up   * 3-5 Users * Test with/without both designs * Open-ended questionnaire plus compare/contrast * Observation of specified task performances   * Iterative design   * Ease and efficiency of use * Few errors, pleasant to use * Achievement of learning goal

= **Preece et al. (2002) Chapter 12 notes:** =

**Observation involves watching and listening to users. Observing users interacting with software, even casual observing, can tell you an enormous amount about what they do, the context in which they do it, how well technology supports them, and what other support is needed. (Preece et al. 2002, p. 359).**

// **Goals and questions should guide all evaluation studies. P. 360** //

**Having a goal, even a very general goal, helps to guide the observation because there is always so much going on. P. 361**

**observational data is used to see and analyze what users do and how long they spend on different aspects of the task. It also provides insights into users' affective reactions. P. 363**

Determining goals, exploring questions, and choosing techniques are necessary steps in the DECIDE frame - work. Practical and ethical issues also have to be identified and decisions made about how to handle them. P. 364

Framework possibility:


 * // What //** **is happening? What are people doing and saying and how are they be****having? Does any of this behavior appear routine? What is their tone and body language? P. 368**


 * // Activities. //** What are the actors doing and why? P. 368

Checklist, p. 369-370

· State the initial study goal and questions clearly. · Select a framework to guide your activity in the field. · Decide how to record events-i.e., as notes, on audio, or on video, or using a combination of all three · Be prepared to go through your notes and other records as soon as possible after each evaluation session to flesh out detail and check ambiguities with other observers or with the people being observed. This should be done routinely because human memory is unreliable. A basic rule is to do it within 24 hours, but sooner is better! · As you make and review your notes, try to highlight and separate personal opinion from what happens. Also clearly note anything you want to go back to. Data collection and analysis go hand in hand to a large extent in field - work. · Be prepared to refocus your study as you analyze and reflect upon what you see. Having observed for a while, you will start to identify interesting phenomena that seem relevant. Gradually you will sharpen your ideas into questions that guide further observation, either with the same group or with a new but similar group. · Consider working as a team. This can have several benefits; for instance, you can ~compare your observations. Alternatively, you can agree to focus on different people or different parts of the context. Working as a team is also likely to generate more reliable data because you can compare notes among different evaluators. · Consider checking your notes with an informant or members of the group to ensure that you are understanding what is happening and that you are making good interpretations. · plan to look at the situation from different perspectives Collect a variety of data, if possible, such as notes, still pictures, audio and video, and artifacts as appropriate. Interviews are one of the most important data - gathering techniques and can be structured, semi-structured, or open. So - called **//retrospective interviews//** are used after the fact to check that interpretations are correct. P. 372

The DECIDE framework suggests identifying goals and questions first before selecting techniques for the study, because the goals and questions help determine which data is collected and how it will be analyzed. P. 379

Much of the power of analyzing descriptive data lies in being able to tell a con - vincing story, illustrated with powerful examples that help to confirm the main points and will be credible to the development team. P. 380

=**From Preece, Rogers, & Sharp (2007)**= From Chapter 12:
 * "Evaluation is integral to the design process. It collects information about users' or potential users' experiences when interacting with a prototype ... in order to improve its design**. **It focuses on both the usability of the system, e.g. how easy it is to learn and to use, and on the users' experience when interacting with the system, e.g. how satisfying, enjoyable, or motivating the interaction is" (p. 584).**


 * "Users want interactive products to be easy to learn, effective, efficient, safe, and satisfying to use" (p. 586).**

Why evaluate?:
 * "Evaluation is needed to check that users can use the product and that they like it" (p. 586).**

Evaluation Approaches and Methods: -Usability testing - "involves measuring typical users' performance on typical tasks" (p. 591) - observations and user satisfaction questionnaires and interviews are also used to elicit users' opinions

From Chapter 13: This chapter goes over the DECIDE framework - which we earlier said we'd use - do we still want to? Basically, it is as follows:


 * 1. Determine the goals.**
 * 2. Explore the questions.**
 * 3. Choose the evaluation approach and methods.**
 * 4. Identify the practical issues.**
 * 5. Decide how to deal with ethical issues.**
 * 6. Evaluate, analyze, interpret, and present the data.**

- work through the steps iteratively - back and forth

Nielsen on parallel design:

[]

"Although you should create a minimum of 3 different design alternatives, it's not worth the effort to design many more. 5 is probably the maximum."

**"you should alternate which version they test first, because users are only fresh on their first attempt. When they try the second or third UI that solves the same problem, people inevitably transfer their experience from using the previous version(s)."**

**"After user testing, create a single merged design, taking the best ideas from each of the parallel versions. The usability study is not a competition to identify "a winner" from the parallel designs. Each design always has some good parts and some that don't hold up to the harsh light of user testing."**

Nielsen:

[]

"Parallel design is a project model for usability engineering where multiple designers independently of each other design suggested user interfaces. These interfaces are then merged to a unified design that can be further refined through additional iterative design."

"The design process always requires several rounds where interfaces confront users' needs and capabilities and are modified accordingly. This approach is called [|iterative design].[Ref. 1] Our experience indicates the need for at least two iterations, yielding three versions, before the product is good enough for release. However, three or more iterations are better."

"Each merged design element can be the best of the parallel versions' corresponding elements or a synthesis of several parallel versions' elements. The merging designer can also introduce new elements, although it is probably better to postpone the introduction of most new elements until the iterative part of the parallel process"

Instructional Video Evaluation Instrument:

http://www.joe.org/joe/1996june/a1.php

From Preece, Rogers, & Sharp (online):

**<span style="font-family: Verdana,Arial,Helvetica,sans-serif;">Your list of heuristics for evaluating Websites. **

<span style="font-family: Verdana,Arial,Helvetica,sans-serif; font-size: medium; line-height: normal;"> //Match between system and real world// <span style="font-family: Verdana,Arial,Helvetica,sans-serif; font-size: medium; line-height: normal;"> //Consistency and standards// <span style="font-family: Verdana,Arial,Helvetica,sans-serif; font-size: medium; line-height: normal;"> //Flexibility and efficiency of use// <span style="font-family: Verdana,Arial,Helvetica,sans-serif; font-size: medium; line-height: normal;"> //Aesthetic and minimalist design// <span style="font-family: Verdana,Arial,Helvetica,sans-serif; font-size: medium; line-height: normal;"> //Use of modes// <span style="font-family: Verdana,Arial,Helvetica,sans-serif; font-size: medium; line-height: normal;"> //Structure of information// <span style="font-family: Verdana,Arial,Helvetica,sans-serif; font-size: medium; line-height: normal;"> //Extraordinary users//
 * <span style="font-family: Verdana,Arial,Helvetica,sans-serif;">Is information presented in a simple, natural and logical order? ||
 * <span style="font-family: Verdana,Arial,Helvetica,sans-serif;">**Have ambiguous phrases/actions been avoided?** ||
 * <span style="font-family: Verdana,Arial,Helvetica,sans-serif;">**Does the system guide novice users sufficiently?** ||
 * <span style="font-family: Verdana,Arial,Helvetica,sans-serif;">Is the system easy to remember how to use? ||
 * <span style="font-family: Verdana,Arial,Helvetica,sans-serif;">Is it easy to exit from each mode of use? ||
 * <span style="font-family: Verdana,Arial,Helvetica,sans-serif;">Is the length of a piece of text appropriate to the display size and interaction device? ||
 * <span style="font-family: Verdana,Arial,Helvetica,sans-serif;">Are equivalent alternatives provided for visual and auditory content? ||

Will we continue with our parallel design and use the two versions to evaluate 'easy to remember, usability, satisfying, and helpful', for example?

So the research question may be something like, "Which tutorial format, static, or dynamic, best meets the learning needs of users, providing high usability, with easy to remember, helpful instructions, that satisfy user needs to learn the application?"


 * With our hypothesis perhaps being that a dynamic tutorial presentation will serve to be more memorable and helpful in learning to use our application.**

or ....

or ....

or ....

Don't know if this heuristic evaluation stuff fits or not ...

Heuristic Evalualtion - Jakob Nielsen

<span style="font-family: Verdana,Arial,Helvetica,Geneva,SunSans-Regular,sans-serif; font-size: medium; line-height: normal;">Heuristic evaluation is a [|discount usability engineering] method for quick, cheap, and easy evaluation of a user interface design. Heuristic evaluation is the most popular of the usability inspection methods. Heuristic evaluation is done as a systematic inspection of a user interface design for usability. The goal of heuristic evaluation is to find the usability problems in the design so that they can be attended to as part of an iterative design process. Heuristic evaluation involves having a small set of evaluators examine the interface and judge its compliance with recognized usability principles (the "heuristics"). (Preece et al. group Heuristic evaluation under 'Analytical evaluation' (p. 592) and note that "a key feature of analytical evaluations is that users need not be present"

http://www.useit.com/papers/heuristic/

More on heuristic evaluation from usability.gov - includes information from Nielsen & Gerhardt-Powals

http://www.usability.gov/methods/test_refine/heuristic.html

=<span style="clear: both; color: #840839; display: block; font-family: Candara; font-size: 1.75em; font-weight: bolder; margin-bottom: -0.5em; margin-left: 0px; margin-right: 0px; margin-top: 0em; padding: 0px;">//Usability Testing - from// = =<span style="clear: both; color: #840839; display: block; font-family: Candara; font-size: 1.75em; font-weight: bolder; margin-bottom: -0.5em; margin-left: 0px; margin-right: 0px; margin-top: 0em; padding: 0px;">//http://www.usability.gov/methods/test_refine/learnusa/index.html// =

Introduction | [|Test Plan] | [|Preparation & Testing] | [|Data Analyses & Report] ==<span style="clear: both; color: #bd5108; display: block; font-family: Candara; font-size: 1.55em; margin-bottom: -0.5em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px;"> == ==<span style="clear: both; color: #bd5108; display: block; font-family: Candara; font-size: 1.55em; margin-bottom: -0.5em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px;">**//Topics on This Page//** ==
 * [|Introduction to Usability Testing]
 * [|Four Things to Keep in Mind]
 * [|Cost]

==<span style="clear: both; color: #bd5108; display: block; font-family: Candara; font-size: 1.55em; margin-bottom: -0.5em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px;"> == Usability testing is a technique used to evaluate a product by testing it with representative users. In the test, these users will try to complete typical tasks while observers watch, listen and takes notes. Your goal is to identify any usability problems, collect quantitative data on participants' performance (e.g., time on task, error rates), and determine participant's satisfaction with the product. ====<span style="clear: both; color: #de790a; display: block; font-family: Verdana,Arial,sans-serif,'Trebuchet MS',Tahoma; font-size: 1.1em; margin-bottom: -0.7em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px; position: relative; z-index: 1;">**When to Test** ==== You should test early and test often. Usability testing lets the design and development teams identify problems before they get coded (i.e., "set in concrete). The earlier those problems are found and fixed, the less expensive the fixes are. ====<span style="clear: both; color: #de790a; display: block; font-family: Verdana,Arial,sans-serif,'Trebuchet MS',Tahoma; font-size: 1.1em; margin-bottom: -0.7em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px; position: relative; z-index: 1;">**No Lab Needed** ==== You DO NOT need a formal usability lab to do testing. You can do effective usability testing in any of these settings: ====<span style="clear: both; color: #de790a; display: block; font-family: Verdana,Arial,sans-serif,'Trebuchet MS',Tahoma; font-size: 1.1em; margin-bottom: -0.7em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px; position: relative; z-index: 1;">**What You Learn** ==== You will learn if participants are able to complete identified routine tasks successfully and how long it takes to do that. You will find out how satisfied participants are with your Web site. **Overall, you will identify changes required to improve user performance.** And you can match the performance to see if it meets your usability objectives.
 * a fixed laboratory having two or three connected rooms outfitted with audio-visual equipment
 * a conference room, or the user's home or work space, with portable recording equipment
 * a conference room, or the user's home or work space, with no recording equipment, as long as someone is observing the user and taking notes
 * remotely, with the user in a different location

==<span style="clear: both; color: #bd5108; display: block; font-family: Candara; font-size: 1.55em; margin-bottom: -0.5em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px;"> == > We try hard to ensure that participants do not think that we are testing them. We help them understand that they are helping us test the prototype or Web site. > > We measure both performance and subjective (preference) metrics. Performance measures include: success, time, errors, etc. Subjective measures include: user's self reported satisfaction and comfort ratings. > People's performance and preference do not always match. Often users will perform poorly but their subjective ratings are very high. Conversely, they may perform well but subjective ratings are very low. > > **Usability testing is not just a milestone to be checked off on the project schedule. The team must consider the findings, set priorities, and change the prototype or site based on what happened in the usability test.** > > **Most projects, including designing or revising Web sites, have to deal with constraints of time, budget, and resources. Balancing all those is one of the major challenges of most projects.** >
 * 1)  ** Testing the Site NOT the Users **
 * 1)  ** Performance vs. Subjective Measures **
 * 1)  ** Make Use of What You Learn **
 * 1)  ** Find the Best Solution **

==<span style="clear: both; color: #bd5108; display: block; font-family: Candara; font-size: 1.55em; margin-bottom: -0.5em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px;"> == Cost depends on the size of the site, how much you need to test, how many different types of participants you anticipate having, and how formal you want the testing to be. Remember to budget for more than one usability test. Building usability into a Web site (or any product) is an iterative process. Consider these elements in budgeting for usability testing: ====<span style="clear: both; color: #de790a; display: block; font-family: Verdana,Arial,sans-serif,'Trebuchet MS',Tahoma; font-size: 1.1em; margin-bottom: -0.7em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px; position: relative; z-index: 1;">**Time** ==== You will need time to plan the usability test. It will take the usability specialist and the team time to get familiarized with the site and do dry runs with scenarios. Budget for the time it takes to test users and for analyzing the data, writing the report, and discussing the findings. ====<span style="clear: both; color: #de790a; display: block; font-family: Verdana,Arial,sans-serif,'Trebuchet MS',Tahoma; font-size: 1.1em; margin-bottom: -0.7em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px; position: relative; z-index: 1;">**Recruiting Costs** ==== Recruiting Costs: time of in-house person or payment to a recruiting firm. Developing a user database either in-house or firm recruiting becomes less time consuming and cheaper. Also allow for the cost of paying or providing gifts for the participants. ====<span style="clear: both; color: #de790a; display: block; font-family: Verdana,Arial,sans-serif,'Trebuchet MS',Tahoma; font-size: 1.1em; margin-bottom: -0.7em; margin-left: 0px; margin-right: 0px; margin-top: 1em; padding: 0px; position: relative; z-index: 1;">**Rental Costs** ==== If you do not have equipment, you will have to budget for rental costs for the lab or other equipment.

Information on parallel design:

http://www.usability.gov/methods/design_site/parallel.html