Web Captioning Overview

What are Captions?

Captions are text versions of the spoken word. Captions allow the content of web audio and video to be accessible to those who do not have access to audio. Though captioning is primarily intended for those who cannot hear the audio, it has also been found to help those that can hear audio content and those who may not be fluent in the language in which the audio is presented.

Common web accessibility guidelines indicate that captions should be:

  • Synchronized - the text content should appear at approximately the same time that audio would be available
  • Equivalent - content provided in captions should be equivalent to that of the spoken word
  • Accessible - caption content should be readily accessible and available to those who need it

On the web, synchronized, equivalent captions should be provided any time audio content is present. This obviously pertains to the use of audio and video played through multimedia players such as Quicktime, RealPlayer, or Windows Media Player, but can also pertain to such technologies as Flash, Shockwave, or Java when audio content is a part of the multimedia presentation.

Closed vs. Open Captions

Captions as typically seen on television
Screenshot of black and white news footage showing battleships in the distance. Captions display on the image which read - the curtain rises on the greatest military experiment ever undertaken.

Most people are familiar with closed captioning, a technique of displaying the captioned text only when it is desired. All television sets with screen sizes of 13 inches and larger must contain the hardware to display captions. Closed captioning of most pre-recorded television programs is now a legal requirement in the United States. Television closed captioning is used by millions of individuals who are deaf or hard of hearing; millions more use it in the classroom or in noisy environments—like bars, restaurants, and airports. As the average age of the population increases, so does the number of people with hearing impairments. According to US government figures, one person in five has some functional hearing limitation. Because of the growing need for access to captions, many live broadcasts (such as news and sports events) and DVD and VHS programs now include closed captioning.

Closed captions for television are very limited in their formatting, because the caption look, feel, and location are determined by the caption decoder built into the television set. You can get more information about television captioning at Captioning FAQ - external link.

Captions as seen on DVD
Screenshot from movie The Grinch. A girl holding christmas boxes. Captions read - Can't you feel it? - Merry Christmas!

Open captions include the same text as closed captions, but the captions are a permanent part of the picture, and cannot typically be turned off. This is much like watching subtitling of foreign language films. DVD utilizes a form of subtitling to display captions. Open captions are not decoded by the television set, but are a part of the video information. This technique allows for more control over caption location, size, color, font, and timing.

Captions as seen in Windows Media Player
Screenshot of Windows Media Player showing how captions are displayed below the video

For web video, captions can be open, closed, or both. A common technique for open captioning is adding the caption text directly to the video itself. This requires a video editing or encoding program that allows you to overlay titles onto the video. The captions are visible to anybody viewing the video clip and cannot be turned off. This gives you total control over the way the captions appear, but can be very time consuming and expensive to produce. The more common way of captioning audio and video on the web is to use functionality within the multimedia players to display the captions along with or on top of the video or audio.

Transcripts

Transcripts also provide an important part of making web multimedia content accessible. Transcripts allow anyone that cannot access content from web audio or video to read a text transcript instead. Transcripts do not have to be verbatim accounts of the spoken word in a video. They can contain additional descriptions, explanations, or comments that may be beneficial. Transcripts allow deaf/blind users to get content through the use of refreshable Braille and other devices. For most web video, both captions and a text transcript should be provided. For content that is audio only, then a transcript will usually suffice.

Transcripts provide a textual version of the content that can be accessed by anyone. They also allow the content of your multimedia to be searchable. Screen reader users may also prefer the transcript over listening to the audio of the web multimedia. Most proficient screen reader users set their assistive technology to read at a rate much faster than most humans speak. This allows the screen reader user to access the transcript of the video and get the same content in less time than listening to the actual audio content.

Captioning Terms and Technologies

On the web, the primary multimedia technologies are Microsoft's Windows Media Player, Apple's Quicktime, RealNetwork's RealPlayer, and Macromedia Flash. Unfortunately, there is no fully standardized mechanism for captioning across these technologies. Each media player handles captions differently. Below are some common technologies and terms that apply to captioning within the various media players.

SMIL (Synchronized Multimedia Integration Language)
A standards-based language used by Quicktime and RealPlayer to control the layout and presentation of visual and audible items. SMIL is used to control the display, positioning, and timing of captions and audio/video multimedia. The captions themselves are stored in a Text Track file if you’re using Quicktime or a RealText file if you’re using RealPlayer.
SAMI (Synchronized Accessible Media Interchange)
Microsoft’s technique for adding captions for Windows Media Player. A SAMI file contains the text to be displayed within the captions and information that synchronizes individual caption displays to the multimedia presentation.
Text Track
Quicktime uses a Text Track file to store the caption text and synchronization (timing) information.
RealText
RealPlayer uses a RealText file to store the captions text and synchronization (timing) information.
MAGpie
Developed by the CPB/WGBH National Center for Accessible Media (NCAM), MAGpie is a free tool for creating caption files that can be utilized by media players.
Hi-Caption
HiSoftware's Hi-Caption allows creation of captions for media players.

WebAIM is an initiative of:
Center for Persons with Disabilities (CPD) Utah State University