An edited version of this interview appears in Chapter 9 of A Web for Everyone.
Larry Goldberg was Director of the Carl and Ruth Shapiro Family National Center for Access Media (NCAM) at WGBH Boston, one of the most accessibility-aware media companies in the world. He is now Director of Community Engagement. In addition to producing award-winning, captioned, and described television and web programs, WGBH hosts the National Center for Accessible Media, or NCAM, a research and development group focused on ensuring equity in media access. Larry oversees NCAM, where his dedication to developing technologies, policies, and practices to support accessible media has been instrumental in mainstreaming captions and video description and other innovative technologies.
We asked Larry what we could learn from the process of bringing captioning to television that will help us mainstream accessible media on the web.
Integrated technology as the tipping point
Captioned television is everywhere: in bars, airports, gyms—wherever hearing is difficult, and we need to see what is said. But that wasn’t always the case. There was a time when captions were an add-on, delivered using separate technology in the form of a set-top box purchased by deaf and hard-of-hearing television viewers. The tipping point for captions came when the capability for displaying captions was built into standard television sets—by means of an act of Congress. With the technology in place, the challenge became to produce captions for all television programming.
Getting there wasn’t easy, as Larry can attest, having been around through much of the process. The first step was to dispel the notion that captions were costly and benefitted only a small number of viewers. “You don’t want to forget the primary purpose—that deaf people needed captions—but when it became obvious that captions helped comprehension and late-night television watching, and when the TV production community saw that they could integrate captioning into the production process without a lot of time and expense, they said, ‘Fine, go ahead.’”
Like captions for television, most web media players can have caption display capacity built in. With Flash and QuickTime and Windows Media, you can add a caption into any video; however, in most cases, captions are not required. In the United States, captioning recently became required under the new 21st Century Communications and Video Accessibility Act (CVAA), but only for previously broadcast video, not for user-generated or web-only video.
Perhaps the “CC” button on the YouTube player will play a role similar to what built-in captioning technology did for television, by compelling web media producers to provide a caption track in response to viewer expectations.
Becoming part of the process
Process integration was key to mainstreaming captions. Captioning services like those at WGBH had to get faster and more efficient, and integrate seamlessly into the media production workflow. “We had to work fast so we didn’t hold up delivery deadlines.” This meant overnight shifts, but also creating better tools so the captioners could work more quickly. It also meant coming up with workflows that would integrate into production. “Once captioning became a line-item in budgets, and an expected check-point in the production flow, it became an accepted way of doing things.” Expectations for captioned television in bars and health clubs also helped.
When a TV producer who may never have met a deaf person goes to the gym every day and sees captions, they just accept it. Or they look and go, “Hey! Why isn’t that show captioned? There’s an interview on—I want to know what he’s saying!” Or they’re at a bar and there’s a game on, and they say, “What just happened? What’s that call? Hey! Could somebody turn on the captions?” These wider circles of usage certainly help.
Once people stopped asking the question, “How many people are going to see these captions,” and captioning services became fast and cost-effective, captioning became part of the process of producing and distributing television programs.
Enhancing media with accessible features
With web-based digital technology, the broad benefits of accessibility features are even greater than with television. “In the earliest days, even in QuickTime 1.0, the benefits of searchability were fairly obvious,” offering the ability to find key words in a video by searching a synchronized text track. “Captions became a universal design enhancement that was feeding the world of search.”
There is evidence that the presence of captions increases the attention to and time spent with video. “We believe captions are driving viewership and ‘stickiness.’” And text has myriad benefits over other media when it comes to sharing.
You put time into creating a video, even if it’s a throwaway, even if it’s only going to be online for half a day. If there’s value to it and you want people to see it, then creating a text enhancement is going to help—for cut-and-paste, for sharing. Sharing video is kind of hard, especially since different devices have different support. But sharing text is pervasive. So if you have a text file of your media, whether it starts as audio or video, it’s much more readily shared. And you can tell people about it in all your social media tools by pulling pieces of text out, posting or tweeting the text, and driving people to your media.
Some companies are starting to exploit accessibility features for other purposes, such as popping up advertisements based on what is said within a piece of media. “We will see a lot more targeted advertising in video,” Larry predicts.
Making text from audio
“It’s the transcribing aspect that takes time,” and speech-to-text software is only partially helpful, such as YouTube’s automatic speech transcription, “which is frequently only partially accurate.” Most media companies outsource transcription and captioning because the expertise needed is not typically part of a media production team. Some, like Netflix, are even now experimenting with crowd-sourcing their caption work. Services like the WGBH Media Access Group make it easy to outsource caption creation, and the prices for transcribing and captioning have come way down. Plus, there’s more to good captions than simply transcribing audio to text.
High-quality captions are crafted to make the captions more readable. YouTube and other auto-captioning tools won’t do that: things like breaking the sentence in the right place, and removing captions during long pauses. Our captioners do everything in one step: they transcribe, time, place, and add extra stylistic aspects. So far, we have found that using speech transcription as a first step does not save us time because our captioners are trained to do all the enhancements in the first pass.”
But there are instances when outsourcing may not be necessary. If you start your media production process with a transcript or teleprompter text, it can become the basis for captions. Services like YouTube’s auto-timing work fairly well for synchronizing a prepared and accurate transcript with video.
Partnering with transcription software
Speech transcription software can be a help, but only with clear audio. “You can’t just take random, noisy, multi-speaker audio and expect high quality automatic transcription.” But, with care, it’s possible to transcribe, with enough accuracy, a clean recording of clearly spoken audio using speech-to-text software like Dragon Naturally Speaking. As an example, Larry cites the Liberated Learning Consortium, an IBM research project in which professors record lectures using high-quality mics and trained software to produce accessible lecture materials.
I know we want the tools to shape themselves to us and not us shape ourselves to the tools, but… if you talk a little bit more robotically and you enunciate properly you can actually get a decent transcript using automatic speech recognition tools.
Adding captioning to the web media production workflow
As for who should be responsible for integrating captions, Larry suggests it’s all part of post-production—editing the media and digitizing for different platforms. “The people who know video and editing tools get this, ” as adding captions is similar to titling video and adding credits—and adding other forms of metadata.
Some organizations offer services to their constituencies to support the practice of accessible media. Several California colleges and universities offer an online service that manages the captioning workflow. Faculty submit a lecture recording, for example, and the service manages the transcription, captioning, and publishing, typically outsourcing at least the transcription part of the process. With low-cost transcription services available, the overall cost for the service becomes quite manageable.
Looking ahead for accessible media
The research and development aspect of Larry’s NCAM work looks at new technologies, “making sure that everyone can use whatever new, essential, or cool thing that’s coming up that will have an effect on people’s lives—at home, in school, and in the office and community. Can we make sure it’s a level playing field?” The other aspect is finding ways to exploit those technologies for accessibility.
For example, the HTML5 media architecture offers capabilities for specialized content. “With HTML5, you can link to different types of synchronized streams within the same webpage.” For an instructional video containing information written on a board, the same information could open in a new window as text. Or the information could be inserted into the video as a text track, and viewers could pause the video, listen to the synthesized text, and resume playing the video, making the content accessible to people who were blind or visually impaired.
Given the demonstrated value-added nature of captions and other accessibility features, Larry predicts that “as more captions come online as part of the new requirements, others not covered by the rules will too begin providing captions because they see the value.”