Survey book of the month, October 2011
11/01/2011Which is better: an open question or a closed one? Should you include a “don’t know” option in your closed questions? Is there a “right” order for asking questions?
If topics like these concern you, then you’ll want to read my choice for this month:
Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context by Howard Schuman and Stanley Presser. (1996, reprinted in 1981)
A shortcut into the research literature
Although this book hasn’t been updated since 1996, it continues to be much-cited. Why? Because the authors conducted a series of experiments on different ways of asking questions, and then report on all of them in this one convenient volume. They also reviewed a swathe of the relevant literature. So it’s a sort of shortcut into the research on question wording from the 1950s to 1990s, an era where much research was done that is still relevant today, but the papers are often hard to get hold of.
In UX, we often suffer from reports of exactly one experiment in a limited context with a small, unrepresentative group of participants that are then offered up as ‘fact’ as if they applied to everyone. If you, too, find that sort of over-large claim highly irritating, then you’ll enjoy reading this book. It’s full of examples where the authors tried to probe and replicate. Often, they
found that an early compelling result didn’t actually replicate as they hoped – which
sometimes means they are less than conclusive in their recommendations, but far more
realistic.
One to borrow rather than buy?
I have to admit it’s not exactly a zippy read. If you’re a regular reader of the type of
academic papers that quote a lot of ‘p’ values, then you’ll probably rattle along. But even so, you’ll need to exert some imagination. The examples are obviously all from an earlier era, and many of them explore political problems that are now no longer part of our everyday concerns.
So I’m going to pull out some of the key findings for you here.
The order of the questions is important
The first topic they tackle in depth is question order. There are some famous experiments that manipulated question order, such as with these two items:
- Do you think the United States whould let Communist newspaper reporters from other countries come in here and send back to their papers the news as they see it? (“Communist” item)
- Do you think a Communist country like Russia should let American newspaper reporters come in and send back to America the news at they see it? (“American” item)
(I told you these examples are often from eras when concerns were different). These were first used in an experiment in 1948, then replicated by the authors. In both cases, asking the Communist item first got a much lower level of ‘yes’ answers than if the American item was asked first.
This is a ‘context order effect’, also known as a ‘context effect’. Each question is affected by the context within which it is asked, and that context includes the previous question.
The problem with context order effects is that although they undoubtedly exist, they are tricky and slippery. The authors tried various different experiments to try to pin them down, but failed: they certainly replicated some effects, but not others; they found effects that were larger than expected, and others that were smaller. They found no straightforward explanation for what might be going on.
As the authors put it in their summing up of the chapter:
“[Context effects] can be very large [and] are difficult to predict”.
The bottom line: question order is important. If you want to run the same survey again and plan to compare the results, make sure that you keep the question order the same each time.
Open questions elicit a wider range of answers, but are not as open as they seem
Closed questions are ones where the respondent has to pick from a range of specific answers, sometimes including ‘don’t know’ and ‘prefer not to answer’. Open questions have an open space for the answers and respondents can choose to provide as short or long an answer as they wish.
The chapter on open versus closed questions reports on experiments that compared the number and range of answers that each type of question can elicit. Broadly, an open question will collect a much wider selection of answers including some that you would never have guessed you’d get.
Unfortunately, open questions also pose problems for analysis, because you’ve got to read the answers and try to put them into categories yourself: and in doing that, there’s a risk of misinterpreting the respondent’s original intention.
But closed questions have their own problems, as I’m sure you’ll recognise if you’ve had the experience of trying to respond to a survey where the survey author continually forced you to choose from answers that don’t resemble the one you want to give.
Here’s how the authors sum up the issues:
“Inadvertent phrasing of the open question itself can constrain responses in unintended ways … we can see no way to discover subtle constraints of this kind except by including systematic open-closed comparisons when an investigator begins development of a new question on values and problems”
Their recommendation about how to get the balance of open and closed questions right? Iteration! In other words:
- Explore your questions in interviews with users,
- Test the questions,
- Check the results and make changes,
- Test again, and repeat until it all settles down.
And here is another point from the book that is well worth thinking about:
“since our results fail to provide strong support for the superiority of open questions, the implication may seem to be that after sufficient pilot work an investigator can rely exclusively on closed items. But we think that total elimination of open questions from survey research would be a serious mistake…. Open “why” questions can also be especially valuable as follow-ups to important closed questions, providing insight into why people answer the way they do… They are needed as well where rapidly shifting events can affect answers, or indeed over time to avoid missing new emergent categories. And of course in some situations the set of meaningful alternatives is too large or complex to present to respondents”.
If you offer the option of ‘don’t know’, some people will take it
The chapter on ‘The Assessment of No Opinion’ reports on experiments that compared a question without a ‘don’t know’ filter and the equivalent question with one.
For example here are the unfiltered and filtered ways of asking a question:
- “In general, do you think the courts in this area deal too harshly or not harshly enough
with criminals?”
- “In general, do you think the courts in this area deal too harshly or not harshly enough
with criminals, or don’t you have enough information about the courts to say?”
The unfiltered question got 6.8% ‘don’t know’ answers; the filtered version got 29.0% ‘not enough information to say’ answers.
So including a ‘don’t know’ filter is very likely to increase the proportion of who answer with ‘don’t know’. Why? Because a respondent might opt for ‘don’t know’ for all sorts of reasons, including:
- Have thought about it and have yet to make up my mind
- Haven’t thought about it
- Have an opinion but don’t want to reveal it to you
- Sort of remember having an opinion but can’t be bothered to recall it.
Does this matter? Schuman and Presser’s results show us that yes, it does matter.
If your respondents genuinely don’t have an answer, then forcing them to choose an answer
will produce unreliable results. But if they might really have an answer but don’t want to
make the effort of finding it, then offering them a ‘don’t know’ option will lead to under-reporting of the true answers.
The bottom line: If you have the resources to test your questionnaire thoroughly through all
its phases (preliminary investigative interviews, cognitive interviewing, usability testing,
and pilot testing) then you will almost certainly have accurate questions that your
respondents have answers for, and you won’t need a ‘don’t know’ option. Otherwise: keep it
in. A bit of accurate under-reporting is better than a pile of random unreliability.
Sometimes “no opinion” is a valid opinion… and sometimes it isn’t
Some years before this book was written, a famous series of experiments asked Americans
about the Agricultural Trade Act, and found that many people were entirely happy to
volunteer an opinion on it even though it didn’t exist.
These authors did not like the idea of tricking their respondents, and opted to ask about a
real bill that they anticipated few people would really know about: the Monetary Trade Bill.
As they put it:
“Respondents make an educated (though often wrong) guess as to what the obscure acts represent, then answer reasonably in their own terms about the constructed object”.
This issue has not gone away. In a survey in 2011 in local government in the UK, one team
found that their respondents were enthusiastic about ‘free schools’. Well, wouldn’t you be?
But in fact, this is not asking about whether parents should pay for their children’s
education or not. A ‘free school’ in this context refers to a particular new way of setting
up a school and concerns its governance not its charges to parents.
Back to the book. After some considerable experimenting, the authors conclude that:
“a substantial minority of the public – in the neighborhood of 30% – will provide an opinion on a proposed law that they know nothing about *if* the question is asked without an explicit ‘don’t know’ option”.
The bottom line: You may be collecting opinions from informed people. Or you may not. Don’t base important decisions on data collected from people who didn’t know what you were talking about, but gave you an opinion anyway (maybe because you didn’t offer them a ‘don’t know’ option).
Balance your questions for more accuracy
Polling bias occurs when respondents are asked a question that implies a particular answer.
There was a wonderful exhibition of polling bias in the British TV show “Yes, Prime
Minister”. Sir Humphrey, the senior civil servant, demonstrates how you can get people to
agree or disagree with a policy – in this case, compulsory national service in the military
– by a careful selection of a series of questions. (This clip from King of the Paupers
includes the relevant section. Some people have had trouble viewing the clip, so there is a transcript at the end of this post).
Schuman and Presser’s next chapter, “Balance and imbalance in questions”, explores questions of the form: “Some people think A, others think B, what is your view?” using the examples of questions on gun control and abortion. The idea of adding the both arguments is to reduce the possibility of building bias into the question.
They found that if a question clearly implies the possibility of a negative, then adding an
opposing argument to make that explicit doesn’t make much difference.
The bottom line: if you are writing question about attitudes, try writing the question in both directions i.e. positive and negative. Think about whether you are pushing people in one direction or the other. Aim to be neutral.
The tendency to agree (“acquiescence bias”) is not as strong as sometimes claimed
Polling bias is an extreme form of another question-response problem, “acquiescence bias”.
This is the tendency to agree. We saw it operating in Sir Humphrey’s humorous series of
questions on the TV show, and it is one of the arguments for swapping the order of some statements when asking people about a series of aspects of something eg in the System Usability Scale.
They call this chapter “The Acquiescence Quagmire”, because despite lots of literature on the topic going back to Likert himself, they found that the effects of acquiescence bias are much less clearcut as than they expected.
For example, they mention a study by Lenski and Leggett from 1960, which looked at these two questions:
- It is hardly fair to bring children into the world, the way things look for the future.
- Children born today have a wonderful future to look forward to.
Although the original study claimed that contradictory answers on these two questions were evidence of acquiescence bias, Schuman and Presser point out that it is quite possible to disagree with both statements. They experimented with the question
“Which in your opinion is more to blame for crime and lawlessness in this county: individuals or social conditions?”
and found that what is happening is not at all obvious.
The wording of the question is really crucial,and there are other complicating factors such as the level of education of respondents and, for some types of question in face to face interviews, the race of the respondent compared to the race of the interviewer.
The bottom line: if you need to explore levels of agreement with opposite opinions, then the biggest mistake you can make is to assume that your two opposite opinions are actually the full set.
Do your respondents care as much as you do, or much more?
The authors open their chapter “Passionate Attitudes: Intensity, Centrality, and Committed
Action” with a quote from “A preface to democratic theory” by R. A. Dahl (1956):
“What if the minority prefers its alternative much more passionately than the majority prefers a contrary alternative?”
This was an issue that I’ve run into a few times, for example some years ago when I was
working on a survey of user experience professionals on behalf the Usability Professionals’ Association (UPA). Some members wanted UPA to introduce a usability certification, and we did indeed find a majority of our respondents was in favor – but there was an important minority that was deeply against.
Schuman and Presser offer these three definitions to help us think through the issues:
- Intensity is the subjective strength of the attitude
- Centrality is subjective importance to the respondent
- Committed action is doing something about it e.g writing to your senator.
Their examples include investigation of attitudes in the US towards gun control. They found that people who were against gun control were good at ‘committed action’, so had a greater impact even though there were fewer of them.
The precise questions that Schuman and Presser were investigating were big national
political matters, and hardly the stuff of our everyday practical concerns in user
experience.
The underlying issues, however, are very much part of what we have to grapple with. Remember Google Buzz? Most users were happy with it; a very vocal minority was enraged by its privacy policies. Their “committed action”, the intensity of their attitudes, and the centrality of the issue for them, combined to undermine the credibility of the product; Google announced that they were closing it down in October 2011.
Despite that sad story, there is often a big gap between what people say their attitude is and how much they’re prepared to act on it. As Schuman and Presser point out:
“people find it easier to say that they feel extremely strong about an issue than that they
would regard it as one of the most important issues on which to decide their vote”
Attitudes can be crystallised or wobbly
An attitude is “crystallised” if it exists before you ask about it, and it is stable. Asking
the same question another time will get the same answer. Schuman and Presser don’t have a particular term for the opposite of crystallised, so let’s say the opposite is ‘wobbly’.
Wrongly, we tend to treat all attitudes as crystallised. Schuman and Presser found that people are quite good at saying how strongly they feel about a topic. If people don’t care much, their attitude is much more likely to be wobbly.
One aspect they investigated: whether people with more eduction were more likely to have crystallised attitudes on issues of national political policy. The 1960s idea was that if you had a college education, you ought to be firmer in your views. Schuman and Presser found that a higher level of education had an effect on some items but not on others.
Thirty years later, and writing from a British perspective, I find this focus on education
quite surprising:I wouldn’t assume that longer exposure to education necessarily makes
people have clearer political opinions.
“Forbid” is not the same as “not allow”
A: “Do you think the United States should forbid public speeches against democracy?”
B: “Do you think the United states should allow public speeches against democracy?”
Elmo Roper tested these two questions in an A/B test in 1940, and found that 54% of
respondents who gots statement A (‘forbid’) agreed with it, but only 25% of respondents who got statement B (‘allow’) agreed with it. Turning that around, 75% said the US should “not allow” public speeches against democracy – far more than the proportion who would “forbid” them.
Schuman and Presser replicated the experiment in 1974, and twice more in 1976. They got the same effect (“forbid” is not the same as “not allow”), although by then public opinion had changed – nearly 80% of their respondents were against “forbid”, whereas about 55% were were in favour of “allow”.
They got similar effects for some other topics, for example whether to forbid, or to not allow, cigarette advertising on TV. But not always. A question about “abortion” tested against one asking about “ending a pregnancy” did not produce the same effect: it seemed that those two items were seen as exactly equivalent by respondents.
1976 is a long time ago: is this lack of equivalence still a problem? Answer: yes, probably.
For example, in 2000 Bregje Holleman published “The Forbid/Allow Asymmetry: On the
Cognitive Mechanisms Underlying Wording Effects in Surveys”, reporting on another extensive series of experiments on the same problem. Like Schuman and Presser, she found that “forbid” is not the same as “not allow” – mostly. But sometimes it is. So the effect persists.
“Forbid/allow” is a tenacious topic and once it has gripped you it seems hard to let it go.
I learned about Bregje Holleman’s book from a review written by Harold Schuman – the co-author of the book I’ve been talking about here. Then it came up again for me at the European Survey Research Association conference in Lausanne in 2011, where Naomi Kamoen described one of her experiments on a similar set of questions: “How easy is a text that is not difficult? Comparing answers to positive, negative, and bipolar questions” – and her supervisor is the same Bregje Holleman who wrote the book in published in 2000.
The bottom line: It all comes down to the specific wording of the actual question. For example, “satisfied” is not the same as “not dissatisfied” and definitely not the same as
“delighted”.
And finally: Context effects are a serious hazard
Schuman and Presser wrap up their book with a chapter where they reflect on their findings, and the experience of running so many survey experiments. They mostly conclude that replicating results is harder than it looks – a useful point to remember when reading research papers in general, particularly if the results seem counter-intuitive.
They also muse on the overall challenge of ‘context effects’, another way of saying that the way respodents will answer questions is strongly affected by the way you ask the questions, and by the way that questions are ordered. For example, they say:
“General summary type questions are especially susceptible to context effects and should probably be avoided if the needed information can be built up from more specific questions”
Key points to take away
Here are four key things I learned from this book that you may also find helpful:
- Start with open questions and test a lot
- If you want to collect informed opinion, offer a ‘don’t know’ option
- When collecting attitudes towards statements, try to use balanced questions
- Ask for strength of opinion as well as direction of opinion
———————————————————————-
Transcript of “Yes Prime Minister” where Sir Humphrey demonstrates acquiescence bias.
Sir Humphrey: “You know what happens: nice young
lady comes up to you. Obviously you want to create a
good impression, you don’t want to look a fool, do
you? So she starts asking you some questions: Mr.
Woolley, are you worried about the number of young
people without jobs?”
Bernard Woolley: “Yes”
Sir Humphrey: “Are you worried about the rise in
crime among teenagers?”
Bernard Woolley: “Yes”
Sir Humphrey: “Do you think there is a lack of
discipline in our Comprehensive schools?”
Bernard Woolley: “Yes”
Sir Humphrey: “Do you think young people welcome
some authority and leadership in their lives?”
Bernard Woolley: “Yes”
Sir Humphrey: “Do you think they respond to a
challenge?”
Bernard Woolley: “Yes”
Sir Humphrey: “Would you be in favour of
reintroducing National Service?”
Bernard Woolley: “Oh…well, I suppose I might
be.”
Sir Humphrey: “Yes or no?”
Bernard Woolley: “Yes”
Sir Humphrey: “Of course you would, Bernard.
After all you told you can’t say no to that. So they
don’t mention the first five questions and they
publish the last one.”
Bernard Woolley: “Is that really what they
do?”
Sir Humphrey: “Well, not the reputable ones no,
but there aren’t many of those. So alternatively the
young lady can get the opposite result.”
Bernard Woolley: “How?”
Sir Humphrey: “Mr. Woolley, are you worried
about the danger of war?”
Bernard Woolley: “Yes”
Sir Humphrey: “Are you worried about the growth
of armaments?”
Bernard Woolley: “Yes”
Sir Humphrey: “Do you think there is a danger in
giving young people guns and teaching them how to
kill?”
Bernard Woolley: “Yes”
Sir Humphrey: “Do you think it is wrong to force
people to take up arms against their will?”
Bernard Woolley: “Yes”
Sir Humphrey: “Would you oppose the
reintroduction of National Service?”
Bernard Woolley: “Yes”
Sir Humphrey: “There you are, you see Bernard.
The perfect balanced sample.”