Audio and Video
Audio Only (Pre-Recorded)
Success Criterion (SC)1.2.1 Level A Required
If you publish any pre-recorded audio-only content (example: podcast, interviews, voice messages, pre-recorded announcements), you must provide a text alternative that presents the same information as the audio. What this means in practice:
Option 1: Provide a full transcript
A text transcript that includes all spoken words and important non-speech audio (e.g., {music], [laughter], [applause].
The transcript should be:
-
Accurate
-
Complete
-
Easily accessible (linked near the audio player)
Most organizations prefer a plain-text transcript because it is easier to implement, works well with screen readers, is searchable, printable, and reusable, and carries fewer compliance and audit risks.
Option 2: A time-synced media alternative
Could be a webpage or docuemnt that conveys the same information as the audio.
Common examples:
-
Script displayed on screen and highlighted in sync with playback
-
An interactive transcript that scrolls or highlights as the audio plays
-
A sign-language video synchronized to the audio (less common, but valid)
There are no additional Level AA requirements specific to pre-recorded audio-only content, so meeting SC 1.2.1 is meeting both level A and AA for audio-only.
Video Only (Pre-Recorded)
Video-only media refers to digital content, such as video without sound or silent animations, that typocally provides information using video, animation, and onscreen text. In order for individuals without vision or have limited vision to access the information, content creations must provide an alternative that presents equivalent information for prerecorded video-only content.
To make video-only digital content accessible, provide
- A full text description of video content, and/or
- A descriptive audio track (audio description) that explains all visual-only
A text description of the video-only content would be a transcript that explains all meaningful visual information. An audio track describes what's happening visually (sometimes called audio description).
Background videos
- Muted by default, if the background video has audio (Level A)
- Users can pause or stop moving content (Level A)
- No flashing that could trigger seizures (Level A)
- Text over video must meet contrast requirements (Level AA)
- Motion should respect reduced-motion preferences (Level AA)
Video with Audio (Pre-Recorded)
Captions are provided.
(SC 1.2.2, Level A)
- Captions must accurately convey speech and relevant non-speech audio (like sound effects or music cues critical to understanding).
- Captions should be synchronized with the audio.
- Autogenerated captions need to be reviewed and edited to ensure 100% accuracy.
Closed Captions vs Open Captions
- Closed captions: Closed captions (often abbreviated CC) are text versions of the audio in a video that users can turn on or off. They’re designed primarily for people who are Deaf or hard of hearing, but they help a lot of other folks too (noisy rooms, muted videos, language learners, etc.).
- Open captions: Open caption text is always displayed on screen with no viewer action required.
Best practice:
- Use the same language as the audio.
- Include speaker identification and sound effects when relevant.
- Ensure captions are readable, ideally 16–24 pt font with high contrast.
Audio Descriptions (For Visual Content)
- Provide audio descriptions for video content if visual information is essential to understanding. (Level A)
- Audio descriptions narrate key visual information during natural pauses in dialogue. (Level A)
- Requires full audio descriptions if visual content conveys important information not in the audio. (Level AA)
Users must be able to control audio applies only when audio plays automatically
If it plays automatically and lasts more than 3 seconds. This ensures users with hearing impairments or cognitive difficulties can pause, stop, or adjust volume. (Level A)
Live Captioning
Live captions (also called real-time captions) are provided for all live audio or video events, showing what speakers are saying in real time. Live captions convert spoken words into text, usually using automated live captions and human live captions.
They are used for:
- Live webinar
- Live streams
- Conferences
- Broadcasts
- Virtual meeting (Zoom, Webex, Microsoft Teams. Google Meet, Skype, etc)
Most major virtual meeting tools now offer built-in automated live captions. As an accessibility best practice and to meet WCAG 2.1 AA (SC 1.2.4) guidelines, enable live captions by default for all virtual meetings where feasible.