Audio Accessibility with Svetlana Kouznetsova

Posted on | One comment

  • A Podcast for Everyone coverAudio accessibility is concerned with making information provided audibly available to people who are deaf and hard of hearing. We see examples of audio accessibility in captions and live captioning. Like all forms of accessibility, there is a spectrum that is defined by features that influence the quality of the experience. At one end of the spectrum, a text version of the spoken content is provided and is somewhat accurate. At the other end, the text closely matches the audio, with accuracy, sound description, and punctuation helping to provide an equivalent experience. To provide a quality user experience, we must make use of all the features that go into an accessible and enjoyable experience of audio content for people who are deaf or hard of hearing.

    Photo of Svetlana KouznetsovaIn this podcast we hear from Svetlana Kouznetsova. Sveta is a user experience designer and appreciates the value of providing a good experience. She brings this perspective to her work as an audio accessibility consultant. She joins Sarah Horton for this episode of A Podcast for Everyone to answer these questions:

    • What is the current state of audio accessibility?
    • What are different features that influence user experience with regard to audio accessibility?
    • Does speech-to-text technology help in creating accessible audio experiences?
    • What should we be thinking about with speech-based interfaces?
    • How can we better promote audio accessibility?

    Sveta is deaf, so we did the interview on Skype using video and chat. That way we could see each other’s expressions and reactions to the text-based conversation. It was Sveta’s idea to conduct the interview in this format and it worked very well. Also, at the end of the interview we had the transcript, included below (note the transcript is a little different since it came from a text conversation—for example, it has smileys). To make the interview accessible to podcast listeners, Elaine Matthias from Rosenfeld Media and Sarah connected on Skype to read and record the interview. It was a great process, and a wonderful way to share Sveta’s insights with both readers and listeners.

    Transcript available · Download file (mp3, duration: 17:31, 10.8MB)

    Svetlana Kouznetsova is a user experience designer, accessibility specialist, and captioning consultant at SVKNYC Web Consulting Services. She also provides audio accessibility services and resources through the Audio Accessibility website, including information about deafness and hearing loss and best practices for accessible media.

    A Podcast for Everyone is brought to you by UIERosenfeld MediaThe Paciello Group, and O’Reilly.

    Subscribe on iTunes · Follow @awebforeveryone · Podcast RSS feed (xml)


    Sarah Horton: Hi, I’m Sarah Horton, and I’m co-author with Whitney Quesenbery of A Web for Everyone from Rosenfeld Media.

    I’m here today with Svetlana Kouznetsova. Sveta is a user experience designer who knows the value of a successful and enjoyable user experiences, and focuses on usable and accessible design, above all else. She is also an audio accessibility consultant, helping companies produce accessible video and audio experiences. Sveta is very active in the user experience, design, and accessibility communities, advocating for audio accessibility and advancing access to media for people who are deaf or hard of hearing.

    As an aside, Sveta is deaf, so we did the interview as a text conversation using Skype. We both prefer talking face-to-face, so we used the video feature, so we could see each other’s expressions, smiles, and laughter, without relying on emoticons. After the interview, I read the transcript of my part of the conversation and Elaine Matthias from Rosenfeld Media read Sveta’s part, and we recorded the reading to use as the audio podcast. Kind of a reverse process, where typically we record audio and transcribe to text for accessibility. It was Sveta’s idea to text and then voice the interview, and it worked really well. And it’s wonderful to be able to share her insights, with listeners and readers alike.

    Sveta, many thanks for joining us.

    Svetlana Kouznetsova: Nice “meeting” you.

    Sarah: First of all, let me say how happy I am that we figured out how to make this work. I really wasn’t sure how a podcast, which is first an audible experience, would work, and it was great thinking it through with you and exploring other options, learning from you other ways to communicate. I’m grateful for the insights, and wonder if you would share them with our listeners, on ways you use technology to communicate?

    Sveta: Okay. Email is my primary way to communicate if people want to reach me. I also use texting when needed—mostly to send brief messages.

    When communicating with people online in real time, I usually use instant messaging features like Skype, Gtalk, AIM, etc.

    I also use video when Skyping with people so that we could see each other and each others’ facial expressions and body language—it’s similar to when people listen to each other voices over phone and hear voice intonations.

    Sarah: Nice, makes sense, and works well! It’s great to see you and talk to you.

    Sveta: Likewise. 🙂

    Sarah: So, how did you get started working in user experience?

    Sveta: I was originally trained as a graphic designer, but I also liked doing websites, and I did a lot of web design work at my first job. I got interested in coding and got a degree in Internet Technology. While in graduate school, I took some business classes—marketing and management.

    I liked marketing classes a lot and had fun doing customer research, but I did not like the idea of asking them to buy things—I wanted products to be more usable to them. Later I found out that it’s part of user experience that is similar to marketing, but the difference is that it focuses on improving experience for users.

    Also, when working on websites, I did sketches and wireframes and loved it. I had no idea that it is also part of user experience and information architecture.

    At another job I was collaborating with a developer who encouraged me to make coding cleaner, and from there I somehow found out about web accessibility and user experience.

    I learned more about it from reading online information and attending events and conferences.

    Sarah: When you talk about user experience and marketing, is that like “experience marketing”—where in providing an excellent and enjoyable experience you end up getting people to buy things?

    Sveta: Yes, I believe that user experience and marketing can go hand in hand and are interdependent.

    I think that marketing is about attracting customers and user experience is about keeping them.

    Sarah: Yes, I like that part, too. It’s not always easy to convince companies that UX will help with the bottom line, though. Have you found that to be the case?

    Sveta: It’s hard. And many businesses think that UX = visual design and coding.

    Sarah: It’s a tough nut to crack, for sure. Let’s talk about audio accessibility; what’s your sense of where we are? Are people thinking about this? Are you finding that companies are receptive to working toward audio accessibility?

    Sveta: It’s something that I still keep needing to educate more people. Even when talking about accessibility in general, many think it’s about coding and doing alternative description for images, and focus more on people with visual and mobility difficulties. But less often on hearing and cognitive difficulties.

    Many think that it’s enough to just have hearing aids or turn up volume to listen to audio, which is not necessarily true. Sadly, hearing loss is very stigmatized by society, so it’s not discussed that much. For example, more people take eye exams— but how many people would take a hearing test?

    More people are willing to wear eyeglasses, but many would try to conceal hearing aids or not to wear them at all. Those who are deaf and hard of hearing would not ask for access—only those who are involved in advocacy work. So it gives the impression to people not familiar with deafness that if a few or no people ask for access to aural information, there’s no demand for it.

    When I ask people to caption video or provide transcripts to audio or to provide real time captioning at events, I’m often being told that I’m the only person asking for it. They do not realize that I speak for about 50 millions of deaf and hard of hearing people in USA. Also, captioning benefits many more people than those who are deaf —like people who are foreign language learners, remedial readers, having a hard time understanding foreign accents, or happen to be in noisy or quiet situations, for example.

    Sarah: It’s true that discussions about accessibility often come down to numbers—how many people will really benefit from this? Unfortunately, since as you point out, so many people benefit from it. With captions, it seems like the benefits are more widely understood and felt than with some other accessibility features, like alt text.

    What I encounter a lot is a resource argument against captions, because it’s something extra.

    Some aspects of accessibility require people—designers, developers—to do things differently. Like using a different design or interaction pattern—like a disclosure widget instead of a tooltip to display supplementary information. Or in code, adding attributes to code so information is available to assistive technology. But captions are different—they require people to do something more. And usually they cost money. In my experience it can be a hard sell, making a case for spending additional time and resources on captions. How to you approach that challenge in your work? What arguments can we use to make a compelling case?

    Sveta: It would be no different from spending money on making buildings accessible for wheelchair users, for example. You need to add ramps and elevators. They are as universal as captioning in the sense that they benefit more people than those who are in wheelchairs—like parents with baby strollers or workers pushing carts.

    It is also no different from spending on other things like editing audio and video—it also costs extra money.

    It is also a better investment than spending more money on lawsuits.

    Lawsuits would not only cause businesses lose money, but also give them a bad reputation. Providing accessibility is the right thing to do. Like ramps and captions, they benefit more people than just those with disabilities.

    Many businesses do not realize that people with disabilities make the largest minority with significant spending power. They make $1 trillion market in USA and $4 trillion market in the world—the latter is about the same size of China.

    And they would also get more customers like families, friends, coworkers—they would make additional 2 billion people in world with a disposable income of $8 trillion.

    So it’s a pretty significant number of potential customers that many businesses ignore.

    Sarah: Yes, there is a strong business case. Also, when it comes right down to getting videos transcribed, I’ve found the costs is not that significant. I think we all wring our hands about how expensive it is, but when it comes right down to it, it’s pretty small in comparison with other costs, as you mention, like shooting, editing especially.

    Part of the difficulty is getting the process embedded smoothly into the overall process. Some people hope technology is the solution.

    I remember getting very excited back in 2006 reading an article about IBM’s “superhuman speech recognition,” which set a goal and timeline for recognizing speech as well as humans. At the time I was working on lecture capture project where we were trying to automate transcription of recorded lectures. Since then we have Siri, which works pretty well, and YouTube auto captions, which don’t work so well. How realistic is it to look to speech to text technology for help with creating accessible audio-based experiences?

    Sveta: That’s the issue. It’s important not to just provide speech to text translation, but also to make it of good quality. No matter how much speech recognition has advanced lately, machines are still not as good as humans. Even people who use speech recognition to provide real time captioning—called voice writers or re-speakers—are not as good as steno captioners. For example, BBC uses voice writers for real time captioning, and that makes many deaf Brits frustrated because they have so many errors.

    And a couple weeks ago I was provided with voice writers by an event organizer who went against my advice for hiring steno captioners. Those voice writers have years of experience, and yet they made more errors than skilled steno captioners.

    If human voice writers using speech recognition cannot provide smooth real time captioning, machines cannot do it even better.

    Sarah: Can you explain what a voice writer is?

    Sveta: There are 2 types of real time captioners. One is a steno captioner that uses a steno machine—like a court reporter. Another one is a captioner who uses speech recognition like Dragon to voice instead of typing.

    I’m not a captioner so I cannot explain it in details. From what I have seen, they speak into a microphone and make words appear on screen. To reduce mistakes, they would need to practice a lot to add words into vocabulary. Steno captioners also practice by adding specific strokes into vocabulary.

    Sarah: Ah, I think I get it.

    Sveta: Another thing about automatic speech recognition is that machines are not able to add proper punctuation, speaker identification and sound description—it can be done only by humans. Proper punctuation is as important in transcription as voice intonation in human speech.

    Another thing also is that speech recognition is not good at foreign accents and background noises.

    Research shows that errors of more than 3% make it harder to read and understand material in print. That’s why real time captioners are expected to type at least 220 words per minute with at least 98 to 99% accuracy in real time.

    For these reasons you need to train a machine to recognize your voice. You cannot just use speech recognition and then start captioning. From what I understand, it takes as much practice for a voice writer as a steno captioner to provide smoother real time captioning with less errors.

    Voice writing may be better for transcribing recorded audio and video so that they can take time to clean up. For real time captioning, however, I would recommend steno captioners.

    Sarah: Speaking of voice and dictation, what about speech-based interfaces, like Siri? What should we be thinking about to make sure those tool are accessible?

    Sveta: I think that Siri may be good for voice commands or other functions than real time captioning. I did try myself—sometimes it may transcribe speech well, sometimes not. The main issue is the time lag to have speech transcribed.

    Siri may be fun for informal conversation—some people tried to use that with me. However, it’s not good for real time captioning.

    Sarah: Thinking about using speech to interact with features, do we need to be sure to have alternative interaction features, like with Siri you can speak a search term or enter it as text. And as long as the text option is available there won’t be barriers for people who can’t speak?

    Sveta: Yes.

    Sarah: Great. I see more interfaces, particularly apps, that use speech, and I’m not sure we are all thinking about having that redundancy.

    Sveta: Even though I can speak, I have Russian accent and deaf voice, so speech recognition would not understand me. It’s hard enough for some humans to understand my speech, to say nothing about machines.

    Sarah: 🙂

    So the last thing is, if you have one bit of advice to offer someone on a product team who needs to advocate for audio accessibility, what would it be? What one argument or rationale could we use to persuade people to commit across the board to, for example, CART for events or transcribing podcasts?

    One note on that—when Whitney and I started doing podcasts with UIE, we didn’t need to convince—they were already transcribing all their media, which was pretty awesome!

    So how do we get others to do that?

    Sveta: I really appreciate that you and Whitney try to make sure that aural information is accessible via quality transcription.

    It may be surprising, but even some people who advocate for accessibility do not practice what they preach as they do not think of making their audio accessible.

    And when posting podcasts or videos, they say “Transcript and captions to come soon.” This is not a good practice because many of us deaf and hard of hearing people are often being told, “I’ll tell you later,” when we ask people to repeat what they say or discuss in group conversations. That message is equivalent to saying, “I’ll tell you later.” So it is advised to post audio and video online only after they are made accessible.

    The example is the recent CSUN conference about accessibility—they posted videos online without captions! They said to be patient and wait. Why should deaf people be patient and wait? Why the rush for hearing people to listen to audio and video? If we are told to wait, so can hearing people, too.

    To answer the last question, I would say that captioning and transcription is universal access and not something that needs to be asked for in advance.

    Hearing loss is very stigmatized and many deaf and hard of hearing people would not ask for it. There are also many people who are late deafened and trying to cope with their hearing loss. So captioning would benefit everyone.

    Generally I would say that accessibility and user experience are not to be separate—things need to be usable and accessible to everyone regardless of whether you have a disability or not. What benefits people with disabilities also benefit others.

    Last but not least, it’s important to provide quality transcription and captioning. It’s more than just converting speech to word. And it also depends on type of audio—transcription for a podcast would be different from transcription for video and live events.

    Sarah: So, don’t wait for someone to ask for captions or transcripts—just do them, do them well, and everyone benefits.

    Sveta: Yes. Just like you don’t wait for someone to ask for ramps and elevators—they benefit everyone.

    Sarah: Right! This has been very helpful and informative. Thanks very much, Sveta!

    Sveta: My pleasure! 🙂

    Sarah: This has been Svetlana Kouznetsova sharing insights on what we can do to design accessible audio experiences, and work toward building a web for everyone.

    Thanks, also, to Elaine Matthias at Rosenfeld Media for voicing Sveta’s part of the text interview for the audio version of the podcast.

    And many thanks to you for listening, and to our sponsors—UIE, Rosenfeld Media, The Paciello Group, and O’Reilly—for making this podcast possible. Follow us @awebforeveryone on Twitter. That’s @awebforeveryone. Until next time!

    One Response to “Audio Accessibility with Svetlana Kouznetsova”

    1. Jana Gunter

      Yes, thanks, steno is the only real, true, clean captioning available for realtime events…glad to see someone understands this fact!
      Steno takes enormous skill and aptitude and years of honing..