How to conduct accessibility testing: a practical guide with AI prompts

Accessibility testing evaluates whether a digital product can be used effectively by people with disabilities, including those who rely on assistive technologies such as screen readers, keyboard navigation, switch devices, and screen magnifiers. The method combines automated scans against the Web Content Accessibility Guidelines (WCAG) with manual expert review and testing sessions with real users who have disabilities, producing a prioritized list of barriers that prevent equal access. According to the WebAIM Million 2025 report, 94.8% of the top one million websites fail basic accessibility checks, making accessibility testing one of the most urgently needed evaluative methods in UX research.

What question does it answer?

Can a person who is blind and uses a screen reader complete core tasks on this product (registration, purchase, navigation) without encountering blockers?
Does the keyboard-only navigation order follow a logical sequence, or does focus jump unpredictably between elements?
Do all interactive elements (buttons, links, form fields, menus) have sufficient labels, contrast ratios, and focus indicators to meet WCAG 2.1 AA criteria?
Which specific WCAG success criteria does the product currently fail, and what is the severity of each failure?
After remediation of identified barriers, do the fixes actually work in practice for users with different disabilities?

When to use

Before launching a new product or major feature, to identify and fix accessibility barriers while changes are still cheap to make — retrofitting accessibility after launch costs significantly more.
When preparing for legal compliance requirements such as the Americans with Disabilities Act (ADA), the European Accessibility Act (EAA), or Section 508 in the United States.
After a redesign or platform migration, to verify that the new implementation has not introduced regressions — a common problem when moving between frameworks or design systems.
On a recurring schedule (quarterly or biannually) as part of an ongoing accessibility program, because websites change continuously and new barriers appear with every code deployment.
When user support data indicates that people with disabilities are reporting problems or abandoning specific flows at higher rates than the general population.

Not the right method when the research question is about emotional appeal, brand perception, or general usability preferences — desirability studies, preference testing, or standard usability testing are better choices for those. Accessibility testing focuses specifically on whether people with disabilities can access and use the product, not on whether they enjoy it. That said, accessibility testing should not exist in isolation: an accessible but unusable product serves no one. Pair accessibility testing with usability testing to cover both dimensions.

What you get (deliverables)

WCAG compliance audit report: a detailed document listing every WCAG success criterion tested, whether it passed or failed, the severity of each failure (critical, major, minor), and the specific elements affected.
Prioritized issue backlog: a ranked list of accessibility barriers organized by severity and effort, ready to be imported into Jira, GitHub Issues, or another project management tool.
Assistive technology compatibility matrix: a table showing which screen readers (JAWS, NVDA, VoiceOver), browsers, and devices were tested and which issues are specific to particular AT combinations.
User testing session recordings and notes: video recordings (with consent) and annotated notes from sessions with participants who have disabilities, showing real struggles with specific elements.
Remediation guidance: for each identified barrier, a description of what needs to change (code fix, design change, or content edit) with code examples where applicable.
Baseline score for longitudinal tracking: a measurable snapshot (e.g., percentage of WCAG criteria passed, number of critical issues) that serves as a reference point for future audits.

Participants and duration

Participants for user testing: 3-5 participants per disability category (visual, motor, cognitive, hearing). A typical study includes 8-15 participants across categories. Automated scans and manual expert reviews do not require participants.
Session length: Automated scans run in minutes. Manual expert review takes 2-5 days depending on site complexity. User testing sessions run 45-60 minutes per participant.
Setup time: 1-2 days to define scope, select tools, prepare test scripts, and recruit participants (recruitment may require 1-2 weeks through specialized services).
Analysis time: 1-3 days for automated + manual findings. User testing analysis adds 2-3 days.
Total timeline: 2-4 weeks for a full accessibility testing cycle. A lightweight automated-only scan can be done in a single day.

How to conduct accessibility testing (step-by-step)

1. Define scope, standard, and success criteria

Determine which pages, user flows, and components will be tested. For a first audit, focus on the most visited pages and critical user journeys (e.g., homepage, login, checkout, search). Choose the compliance standard — WCAG 2.1 AA is the most widely adopted and legally referenced. Establish what “passing” means before testing begins so the team has a shared target.

2. Run automated scans

Use automated tools (axe DevTools, WAVE, Lighthouse, Pa11y, or ARC Toolkit) to scan the selected pages. These tools catch approximately 30-40% of WCAG issues — primarily missing alt text, insufficient color contrast, missing form labels, empty links, and missing document language. Run scans across multiple pages and interactive states (menus open, modals visible, error states). Automated scans are fast but cannot detect issues that require human judgment.

3. Conduct manual expert review

A trained accessibility specialist manually tests the pages using keyboard-only navigation, screen readers (NVDA on Windows, VoiceOver on macOS/iOS, TalkBack on Android), and screen magnification. The expert checks what automated tools miss: logical reading order, meaningful headings hierarchy, correct ARIA attributes, custom widget behavior, and whether error messages are announced to assistive technology. This step catches the remaining 60-70% of issues.

4. Recruit participants with disabilities and prepare test sessions

Recruit 8-15 participants across disability categories: screen reader users (blind or low vision), keyboard-only users (motor disabilities), people with cognitive or learning disabilities, and deaf or hard-of-hearing users (for multimedia content). Use specialized recruitment services (Fable, AbilityNet, local disability organizations). Prepare task-based scenarios that mirror real goals. Ensure all consent forms and test materials are themselves accessible.

5. Conduct user testing sessions

Run moderated sessions (remote preferred, since participants use their own devices and assistive technology setups). During sessions: do not speak over a screen reader’s output, allow participants to attempt tasks in their own way before offering guidance, and document not just success/failure but the path taken. Record sessions (with consent) for the development team.

6. Consolidate findings and prioritize

Merge results from all three testing layers (automated, manual expert, user testing). De-duplicate issues that appear in multiple sources. Assign each issue a priority level based on user impact: critical (blocks task completion entirely), major (task completable but with significant difficulty), minor (inconvenience that does not block use). Map each issue to the WCAG success criterion it violates.

7. Write the report and remediation plan

Structure the report for multiple audiences: an executive summary showing the overall compliance picture and business risk; a detailed findings section with screenshots, WCAG references, and code-level remediation guidance; and a prioritized backlog ready for sprint planning. Include short video clips of users encountering critical barriers.

8. Remediate, re-test, and monitor

After the development team addresses the highest-priority issues, run a targeted re-test. Set up ongoing monitoring with automated scanning tools that run on a schedule (weekly or with each deployment) to catch regressions early.

How AI changes this method

AI compatibility: partial — AI-powered tools can automate detection of a wider range of WCAG violations than traditional rule-based scanners, and LLMs can help generate alt text, assess reading level, and draft remediation guidance. However, real user testing with people who have disabilities remains irreplaceable.

What AI can do

Expand automated scanning coverage: AI-powered accessibility tools (axe AI, accessiBe’s AI engine, UserWay) can detect issues beyond traditional rule-based checks — for example, identifying images that need alt text versus decorative images, or flagging color combinations that fail contrast under specific lighting conditions.
Generate and evaluate alt text: given a set of images from a website, an LLM can generate draft alt text that a human reviewer then edits for accuracy and context, reducing the time needed to write alt text for sites with hundreds of images.
Assess reading level and cognitive accessibility: AI can analyze page content for reading complexity (Flesch-Kincaid, SMOG), flag jargon, and suggest simpler alternatives — directly supporting WCAG success criterion 3.1.5 (Reading Level).
Draft remediation guidance: when an automated scan produces a list of WCAG failures, an LLM can generate specific code fixes for common issues, which developers then verify and implement.
Summarize user testing findings: after accessibility user testing sessions, an LLM can process transcripts and notes to cluster issues by WCAG criterion, generate a severity-ranked issue list, and draft the executive summary section of the audit report.

What requires a human researcher

Testing with real assistive technology users: no AI can replicate the experience of a blind person navigating with JAWS, a person with tremors using switch access, or someone with ADHD attempting to follow a multi-step form.
Judging whether content is actually accessible in context: AI can check that alt text exists, but a human must judge whether “image of a chart” is useful or whether the chart’s data should be provided as a table.
Evaluating custom interactive components: dropdown menus, date pickers, accordions, and drag-and-drop interfaces often behave unpredictably with assistive technology and require manual testing across AT + browser combinations.
Moderating sessions with participants who have disabilities: the researcher must adapt in real time — pausing when a screen reader is speaking, offering alternative task paths, and reading non-verbal cues.

AI-enhanced workflow

Before AI, writing alt text for a large site — say, 500 product images — meant a content team member manually describing each image, typically taking 2-3 full working days. With an LLM, a researcher can batch-generate draft alt text in under an hour, then spend the remaining time reviewing and editing for accuracy. This shifts the bottleneck from creation to quality assurance, cutting total effort by roughly 70%.

Automated scanning tools traditionally caught only about 30% of WCAG issues. AI-augmented scanners (like axe AI) push this closer to 50-60% by detecting semi-automated issues that previously required human review. The manual expert review step remains necessary but starts from a more complete baseline, reducing total audit time by 1-2 days.

Report generation benefits significantly from AI assistance. After a manual review and user testing sessions produce raw notes, an LLM can draft the structured audit report — organizing findings by WCAG criterion, generating code-fix suggestions, and writing the executive summary. The accessibility specialist then reviews and refines the draft, focusing expertise on accuracy rather than on formatting and writing.

Tools

Automated scanning:

axe DevTools (Deque) — browser extension and CI/CD integration for WCAG scanning; the open-source axe-core engine is the industry standard.
WAVE (WebAIM) — visual browser extension that overlays accessibility errors directly on the page.
Google Lighthouse — built into Chrome DevTools; includes an accessibility audit section alongside performance and SEO.
Pa11y — open-source command-line tool for automated accessibility testing, scriptable for CI pipelines.

Manual testing and screen readers:

NVDA — free, open-source screen reader for Windows.
VoiceOver — built-in screen reader on macOS and iOS.
JAWS (Freedom Scientific) — commercial screen reader for Windows; widely used in enterprise.
Colour Contrast Analyser (TPGi) — desktop tool for checking color contrast ratios against WCAG thresholds.

User testing with people with disabilities:

Fable — platform connecting product teams with people with disabilities for testing sessions.
AbilityNet — UK-based organization providing user testing services with disabled testers.
Level Access — end-to-end accessibility auditing and user testing integration.

AI-assisted:

axe AI (Deque) — AI layer on top of axe-core detecting semi-automated issues.
ChatGPT / Claude — for generating alt text, reading-level analysis, remediation guidance, and report drafting.

Works well with

Usability Testing Moderated (Ut): Running accessibility testing alongside moderated usability sessions allows the same user flows to be evaluated for both general usability and accessibility barriers in one research cycle.
First Click Testing (Fc): Comparing where sighted users click first with where screen reader users land first exposes navigation design assumptions that disadvantage assistive technology users.
Benchmarking (Bm): Accessibility benchmarking establishes a baseline compliance score that can be tracked over time and compared across competitors.
Journey Mapping (Jm): Journey maps that include accessibility touchpoints make barriers visible to the whole team, connecting test findings to the broader user experience.
Stakeholder Interview (Si): Interviewing product owners, developers, and legal teams before an audit helps the researcher understand constraints, compliance deadlines, and historical efforts, which informs scope and prioritization.

Example from practice

A mid-sized European e-commerce company received a complaint from a blind customer who could not complete a purchase using JAWS on Firefox. The customer abandoned the checkout after the screen reader failed to announce required fields and could not navigate the payment form. The company’s legal team flagged the complaint as a potential European Accessibility Act liability.

The research team ran a three-layer accessibility audit: an automated axe scan flagged 87 issues across 12 key pages (most were missing alt text and contrast failures). The manual expert review uncovered 34 additional issues that automation missed — including a custom date-picker that was entirely inaccessible via keyboard, a checkout progress indicator that provided no screen-reader feedback, and form validation errors that appeared visually but were not announced. User testing with 10 participants (3 screen reader users, 3 keyboard-only users, 2 with cognitive disabilities, 2 with low vision) revealed that 4 of the 5 critical user flows had at least one blocker for assistive technology users.

The remediation team fixed the 12 critical issues within two sprints (4 weeks), targeting the checkout flow first. A re-test confirmed that all three checkout blockers were resolved — screen reader users could now complete a purchase end-to-end. The remaining major and minor issues were scheduled across the next two quarters. Six months later, the company’s accessibility monitoring showed a 73% reduction in total WCAG violations, and customer support tickets from users with disabilities dropped by 61%.

Beginner mistakes

Relying only on automated scans

Automated tools catch about 30-40% of WCAG issues. Beginners often run a Lighthouse or WAVE scan, see a “passing” score, and conclude the site is accessible. The remaining 60-70% of issues can only be found through manual testing and real user testing. Treat automated scans as a starting point, not a final answer.

Testing without people who have disabilities

Expert manual review is valuable, but an accessibility specialist who can see and use a mouse does not experience the product the same way a blind person using a screen reader does. Even three sessions with screen reader users will reveal problems that no audit checklist catches.

Fixing symptoms instead of systemic causes

When a scan reports 47 images missing alt text, a beginner might add alt text to those 47 images and move on. The real question is why those images lack alt text: is the CMS not prompting for it? Is the component rendering images without an alt attribute? Fixing the template once prevents the issue from recurring on every new page.

Treating accessibility as a one-time project

Every code deployment, content update, and design change can introduce new barriers. Set up automated monitoring that scans on every deployment, and schedule manual re-audits quarterly.

Ignoring cognitive accessibility

Most accessibility testing focuses on screen reader and keyboard compatibility — important, but incomplete. WCAG also addresses cognitive accessibility: clear language, consistent navigation, error prevention, and sufficient time. Include participants with cognitive disabilities in user testing, and evaluate reading level and navigation consistency as part of the manual review.

AI prompts for this method

4 ready-to-use AI prompts with placeholders — copy-paste and fill in with your context. See all prompts for accessibility testing →.