Now available: Managing Priorities by Harry Max

Sample Chapter: Conversations with Things

This is a sample chapter from Diana Deibel & Rebecca Evanhoe’s book Conversations with Things. 2021, Rosenfeld Media.

Chapter 1: Why Conversation Design?

    Rebecca: Hey, folks. Does anyone out there know how to make voice experiences more accessible?
    Diana: Yes, I heard a great talk about that a month ago. This is something I’m trying to learn more about, too. I’d be happy to share my notes—want me to send them to you?
    Rebecca: I’d love to see your notes! Thanks for being so helpful.

This conversation, held over a community Slack channel in 2018, is the origin of our friendship and this book. At the time, we were both about six years into careers in the tech industry. We finally felt like we knew what we were doing—enough to start noticing where the technology was short-changing people, and enough to start getting opinionated about the conversational interfaces we worked on.

Terms Defined: Conversational Interfaces

When you talk to technology and it answers back—whether it’s speaking or typing—that’s a conversational interface. A voice assistant like Siri? Conversational interface. A fridge that says you’re low on milk when you ask? That’s one, too. An interactive virtual reality game where you can talk with characters to advance the game? Yep. It’s a broad term that encompasses nonhuman things that do their best to listen, speak, and chat similarly to the way humans do.

Since we met, conversational technology has only gotten even more ubiquitous: chatbots waving hi from the corners of websites and apps, smart speakers hanging out on countertops, people walking around talking to their watches and glasses. But we still see that the industry’s approach tends to be technology-centered, rather than human-centered strange, considering conversational interfaces are supposed to be modeled after human communication. To us, this disconnect is a huge reason why these technologies aren’t living up to the hype. But we remain fascinated with this work, and optimistic about its potential. Because you’ve picked up this book, it seems like you’re interested in conversational interfaces, too. Keep reading, and you’ll learn everything you need to get started, including how to be a critical, ethical, inclusive thinker.

Let’s begin with a look at what makes conversational interfaces unusual—remarkable, even. First of all, conversational interfaces include lots of kinds of technology; there’s a ton of variety. (Figure 1.1 gives a snapshot of devices that fall under this umbrella.)

Figure 1.1
Pick a conversation partner.

Conversational interactions are different from, say, typing a question into Google; a search engine uses words, too, but it’s not a conversational exchange. Here are some hallmarks of true conversational interfaces:

  • Language (words) is the primary input and output.
  • The nature of the interaction is a back-and-forth conversation.
  • The person’s input is similar to how they’d say it to another person.
  • The system’s output is meant to mimic natural language—to answer on human terms.

Conversational interfaces are powerful because people are language super-users. People learn language intuitively and use it all day, every day, in both speaking and reading. That’s why these interfaces can be so effective: when people experience their technology talking to them, they click right into this easy mode. It’s a deeply innate way to navigate interactions.

Coming to Terms

Conversation design is interdisciplinary, so its practitioners use a lot of jargon coming from different ‘elds—and this jargon isn’t standardized. We’re word nerds, so for this book, we scrutinized what terms people used and where those terms came from. This book uses the term conversational interface broadly, to refer to talking technology, including both spoken and typed interactions. For aural interactions, we use these terms:

  • Voice user interface, or VUI (pronounced voo-ey, rhymes with chewy): A general category of interactions that use out loud speech as an input or output.
  • Voice assistants: A VUI system that’s meant to help you with daily life at home, work, in the car, or everywhere. (These are your Alexas, your Siris, or Googles.)
  • Interactive voice response, or IVR: Older, computer-automated phone systems where users listen to pre-recorded or synthetic voices, and respond by pressing numbers or saying simple voice commands.
  • Text to speech (TTS): Technology that takes text (letters, words, numbers, sentences) and a synthetic voice speaks the text aloud.

For text-based interactions (which necessarily involve a screen), we use these terms:

  • Chatbot: An interactive system where the conversation is typed (instead of spoken). Some chatbots use clickable buttons, too.
  • Multimodal: Systems that use more than one sensory input or output. (For example, a combination of voice and visuals.)

They have other key uses, too:

  • Convenient multitasking: Walking through your front door with two bags of groceries, you say “Alexa, I’m home,” and voilà, your lights turn on and your playlist starts. In the kitchen, hands covered in cookie dough, you can holler “Hey Google, set a timer for twelve minutes.” You’re driving, and you say “Hey Siri, text Chloé that I’m ‘ve minutes late” without taking your eyes off the road or your hands off the wheel.
  • Information “spearfishing”: Navigating apps and websites can involve searching, scanning, clicking, and scrolling. A well-designed bot can cut through the muck and deliver concise bits of information: “What’s my credit card balance?” “When was Bea Arthur born?” When a user nabs the info they want in one quick jab, it’s more frictionless than any web search.
  • Hands-free public spaces: When the 2020 pandemic started, people’s aversion to touching an ATM or vending machine sky-rocketed. Voice interactions can create a less germy future where people can speak with interfaces, rather than tapping sticky screens or pushing grubby crosswalk buttons.
  • Judgment-free help: Research shows that in some situations, people feel more comfortable spilling the beans when they know they’re talking to a “fake” person—a virtual therapist, for example. Shame can be a powerful silencer. During conversations where people often feel judged, with topics on drug use or talking about debt, a neutral speaking partner can ease the stress.
  • Accessibility: For people in the Deaf community, or hard-of- hearing folks, a chatbot can be a much smoother way to get customer support, for example. And using voice makes so many things easier—order take-out, call friends, check the news—for people who are blind, sight-impaired, or have limited mobility for any reason.
  • Infinite patience: Voice assistants don’t mind being woken up at 3 a.m. Chatbots don’t mind if you wait twenty minutes before responding to them. You can ask a bot the same question over and over—it won’t mind.

Conversational interfaces can accomplish things that screens alone can’t. When they’re designed well, they tap into human instincts and emotions, and they feel personal and familiar like no other form of technology. And building a conversational product is a hard, interdisciplinary puzzle—who wouldn’t want to solve a puzzle like that?

Coming to Terms

Speaking of bots in general, that little syllable has been used since the 1960s to denote “a type of robot or automated device,” according to the Oxford English Dictionary, thank you very much. We use the shorthand “bot,” or even “thing,” to refer to conversational interfaces or devices.

What about this sloppy meatball: artificial intelligence? We blame the media, corporate talking heads, and the public imagination for this one devolving into near meaninglessness. We’ll take a stab at a definition that works for this book:

  • Artificial intelligence: Algorithmic systems that try to “think,” speak, or behave like people can.

Sometimes this book uses conversational AIs to refer to more advanced systems that get closer to mimicking human intelligence.

Finally, the most important word we need to address: user.

If you’re in design, you’re probably acclimated to an odd convention: refer to the people who are interacting with the technology—the app, the website, the printer, the smart fridge—as users. It’s right there in the name: it’s the U in UX. There’s been well-founded pushback on the term in recent years. Criticism coming from grassroots UXers, as well as tech bigheads like Jack Dorsey, calls it out as dehumanizing, creating abstraction instead of highlighting the humanity in people the ‘eld is trying to center.

These are valid criticisms. This book employs user because in certain places, the term people felt too general, and we wanted to specifically connote someone using the technology being discussed. When the industry clicks on a better term, we’ll be all in.

Conversation Designers to the Rescue

Conversation design falls under the umbrella of user experience (UX) design, so it’s both human-centered and data-driven—just with a tight focus on talking. Conversation designers are the practitioners of this craft, and they aim to help people and bots have good conversations, starting with what people need and how they use language to express those needs. They think in terms of scripts and flows and user journeys. (Figure 1.2 shows a literal sketch of a conversation design brainstorm. Beware!)

In simple terms, conversation designers usually do these things:

  • Research to understand how people talk and what their needs are.
  • Design personalities for bots.
  • Write responses that the bot will say.
  • Study different ways that users ask for things or express the same idea.
  • Craft diagrams, charts, or sketches of how conversations flow.
  • Create prototypes to test how people react to different personalities, voices, and scenarios.
  • Advocate for accessibility and inclusive design.
  • Collaborate with the people around them.

Figure 1.2
A page from a conversation designer’s notebook.

Conversation design has interdisciplinary roots. Its techniques stem from research on how people ingest, comprehend, and produce language—which means conversation designers often come from diverse backgrounds like linguistics, sociology, psychology, neurology, and more. (And yes, it can take inspiration from the arts, like screenwriting, acting, poetry, and improvisation.)

If you’re trying to find a conversation designer for your team, or wondering how you fit into the conversation design landscape, know that people with a wide and diverse set of backgrounds have this job. Greg Bennett, linguist and conversation designer, says that including these diverse perspectives are a strength, especially for language-driven products, “Because your lens on the world is going to be slightly different than mine, and your lens on how to use language will be slightly different, which reveals something that I can’t see. That’s the best part about it.”

No matter where conversation designers come from, it’s a crucial role, because conversational interfaces are a strange, ever-surprising form of technology. To get them right requires expertise, and without it, a lot of voice and chat interactions end up pretty unhelpful and frustrating. See Figure 1.3 for a sampling of tweets explaining what can happen when conversation design is left out.

Figure 1.3
When Rebecca tweeted “What can go wrong when voice or chat projects don’t have a dedicated conversation designer?” these three folks nailed it: Brooke Hawkins, conversation designer; Lauren Golembiewski, CEO and co-founder of Voxable; and Roger Kibbe, voice and conversational AI technologist.

Conversation design isn’t easy. First, its users are still learning to trust conversational tech. They worry that voice assistants are “always listening.” They’ve been burned before by an obtuse chatbot. They’re traumatized from years of bad computerized phone systems. So designers face an uphill battle trying to build user trust.

Combine that with the fact that people really, really notice when bots can’t hold up their end of the conversation. Most people are such natural language machines that any anomalies are obvious and jarring: that’s why a crappy, stilted conversation feels so wrong. Everyone is a harsh critic, with the highest of expectations for the interface.

Content Warning

Throughout this book, you’ll encounter an unfortunate truth: Because these technologies imitate people (and are created by people), they can be biased and harmful just as people can.

A common theme in technological bias is racism. Ruha Benjamin, author of Race After Technology, sums up this potential for any technology: “So can robots—and by extension, other technologies—be racist? Of course they can. Robots, designed in a world drenched in racism, will find it nearly impossible to stay dry.”1

Conversational AIs have a complicated relationship with femininity, too. They are often criticized for “sexist overtones, demising of women in traditional feminized roles, and inability to assertively rebuke sexual advances,” as authors Yolande Strengers and Jenny Kennedy wrote in their book The Smart Wife.2 This book gives several examples of where racial and gender bias rear their heads.

But these aren’t the only forms of oppression a bot can put out there: they are just the ones with the most research thus far. Conversation designers need to understand intersectionality: “the complex, cumulative way in which the effects of multiple forms of discrimination (such as racism, sexism, and classism) combine, overlap, or intersect especially in the experiences of marginalized individuals or groups,” according to Merriam-Webster. Lots of factors impact how people experience oppression and privilege, like sexual orientation and identity, disability, age, body size, and more.

This book calls attention to bias throughout. It’s a complicated topic, but understanding where it surfaces and how it impacts people is integral to human-centered design.

1 Ruha Benjamin, Race After Technology (Cambridge: Polity, 2019), 62.
2 Yolande Strengers and Jenny Kennedy, The Smart Wife (Cambridge: The MIT Press, 2020), 11.

From a business perspective, companies often misunderstand, underestimate, or simply ignore the need for conversation design.

These are commonly held viewpoints that may lead to trouble:

  • Underestimating the complexity and role of language: “The bot’s essentially an FAQ page.”
  • Treating the project as a purely technological endeavor: “All we need are developers.”
  • Approaching production as if the project were screen-based: “We’ve got a UX team already.”
  • Miscalculating the benchmark for MVP (minimum viable product): “Let’s get something out fast to test and learn.”

These viewpoints have repercussions. Rebecca did her fair share of “chatbot doctoring”—being brought in as a consultant to save an ailing bot. More often than not, when she took a look under the hood, the whole bot had to be discarded, from soup to nuts, because of those assumptions.

That said, it’s totally normal that users and businesses are still get- ting their sea legs with conversational interfaces—the technology is still hitting its stride. And, by their very nature, conversations are hard to design because language is complex. That’s exactly what this book will teach you, starting with the differences between human and mechanical conversations in the next chapter.

The Last Word

Of course, conversation design is unique. Think about it: You’re creating a product that’s modeled after the human mind and its ability to interpret and respond to language. That’s a daunting task.

With good design and process, amazing conversational experiences are possible. Your chatbot or voice experience can be great right out of the gate. You could launch the world’s most elegant talking dishwasher, or make a virtual debate coach. Your talking car could teach a million teens to drive! Your mental health bot could improve lives.

This is why being a conversation designer is fascinating: you get to think big about the complexity of language, the wildness of human behavior, and the inner workings of technology. It’s weird and it’s fun and it’s hard. Never forget, though, that the ultimate goal of a conversational interface is for it to be good—that is to say, easy to talk to, on human terms.

Back to Conversations with Things