How to run a cognitive walkthrough: a practical guide with AI prompts
What is a cognitive walkthrough?
A cognitive walkthrough is a task-based usability inspection method where a small cross-functional team simulates how a brand-new user would think and act at every step of a critical task, asking four prescribed questions at each click to find the places where a first-time user would get stuck. Unlike heuristic evaluation, which inspects an interface against general principles, a cognitive walkthrough zooms in on a specific task and a specific user persona and asks “could a first-time user actually figure out what to do here.” The method is the canonical way to find learnability problems in complex, novel, or walk-up-and-use interfaces — kiosks, healthcare systems, financial onboarding, B2B tools, AI features — before any user ever sees them, and a focused walkthrough on a single critical task with three or four reviewers takes about half a day.
What question does it answer?
- Could a brand-new user complete this critical task on their first try, or does the design depend on prior knowledge they will not have?
- At which exact step of the flow would a new user hesitate, give up, or take the wrong action — and why?
- Does the user understand that they need to perform this action right now, or are we assuming a goal they have not formed yet?
- Will the user even notice the right control on the screen, or is it visually buried, mislabeled, or in an unexpected place?
- After the user takes the right action, do they actually see that they made progress, or does the system stay silent and leave them guessing?
- For a complex new feature with no analog in the rest of the product, where does the learnability budget run out for the target user persona?
When to use a cognitive walkthrough
- Early in the design cycle for a complex, novel, or walk-up-and-use interface where the user has no prior experience to fall back on — kiosks, ATMs, healthcare check-in, government forms, first-time onboarding for a new product.
- When you have a prototype or wireframes for a critical task and you need to find the learnability problems before recruiting users for a usability test, so the test can focus on the questions only real users can answer.
- When the budget or timeline rules out moderated usability testing entirely and you still need a defensible read on whether a first-time user can complete a critical flow.
- When introducing a new mental model or interaction pattern (an AI feature, a new visualization, a non-standard layout) that does not have an obvious convention to fall back on.
- When training engineers and PMs to think about the user — running a cognitive walkthrough together is one of the fastest ways to build empathy with a new user inside a cross-functional team.
- When auditing a long-standing product whose new-user activation rate has dropped, to find the steps where new users specifically (not power users) lose the thread.
Not the right method when the design uses well-known conventions and the question is about polish or general usability — heuristic evaluation is faster and broader for that. It is also the wrong call when the question is about motivation, preference, or unmet needs; those need interviews or diary studies, not an inspection. A cognitive walkthrough does not predict whether expert users will be efficient or whether the design is visually pleasing; its single focus is learnability for a defined first-time user. Finally, it is not a substitute for usability testing with real users — running both gives a more complete picture, because the walkthrough catches the obvious learnability issues while user testing surfaces the surprises only real users find.
What you get (deliverables)
- Task scenario document: a written description of each task being evaluated, the starting state, the goal state, and the user persona (knowledge, expertise, motivation) the team will assume.
- Action sequence: an explicit list of the correct actions a user would need to take to complete the task, one row per click or interaction, used as the spine of the walkthrough.
- Question log: a structured workbook with the four cognitive walkthrough questions answered for every action, with a yes/no/maybe verdict and a one-paragraph rationale for each.
- Failure points list: the actions where the team agreed the new user would likely fail or hesitate, ranked by severity, with the specific cognitive obstacle (goal formation, action visibility, action labeling, feedback) recorded for each.
- Annotated screenshots: each failure point illustrated with the relevant screen and a callout pointing at the obstacle, so engineers and PMs can see what the team saw.
- Recommended fixes: concrete design changes for each failure point, often with a rough effort estimate, so the team can plan the next iteration.
- Readout brief: a five to ten page document or short deck with the scope, the persona, the task list, the failure points, and the recommended fixes; doubles as a baseline for tracking learnability improvements between releases.
Participants and team
- Participants: none recruited as test subjects. Cognitive walkthrough is an expert inspection. The “participants” are 2–6 reviewers from the team who walk through the task in a workshop setting.
- Reviewer mix: the workshop works best with a cross-functional group: one or two UX researchers or designers, one product owner, one engineer, and ideally one domain expert for specialized fields (healthcare, finance, regulated industries). One reviewer acts as facilitator, another as recorder.
- User persona definition: 0.5–1 hour to write the persona — knowledge level, prior experience, motivation, context — so that every reviewer is walking through the task as the same imagined user.
- Task definition: 0.5–1 hour to pick 1–3 critical tasks, write the starting state and goal state for each, and produce the action sequence.
- Walkthrough session: 1.5–3 hours for a single task, depending on its complexity. Most teams cover one or two tasks per session and split larger tasks across multiple sessions.
- Synthesis and writing: 0.5–1 day to write up the failure points, annotate screenshots, prioritize fixes, and produce the brief.
- Total wall-clock time: half a day to two days for one to three critical tasks with a small cross-functional team.
How to run a cognitive walkthrough (step-by-step)
1. Pick the right interface and the right moment
Cognitive walkthrough is built for learnability, so it pays back the most when the interface is novel, complex, or walk-up-and-use, and the team has a prototype or working build of a critical first-time flow. Skip it for routine ecommerce checkouts, login forms, or anything that uses standard patterns the user has seen a thousand times before — those are better served by heuristic evaluation or analytics. Run the walkthrough early enough that the design can still change cheaply (wireframes, mid-fidelity prototypes, beta features), and on the tasks that actually matter for activation, not on every screen in the product.
2. Define the user persona explicitly
Write the persona on a single page before the workshop starts. It should answer four things: what does this user already know about this domain, what do they know about similar products, what is their motivation and context for being on this screen, and what are the two or three previous experiences they would draw on to make sense of what they see. Vague personas like “a regular user” produce vague walkthroughs; the value of the method comes from every reviewer simulating the same imagined person at every step. If the team is split between two plausible personas (a novice and a returning user), run the walkthrough twice rather than averaging.
3. Define the task and write the action sequence
Pick one to three critical tasks for the walkthrough, no more for a single session. For each task, write a one-sentence scenario (“a new patient arrives at the clinic and needs to check in for their appointment using the tablet”), a starting state (“the tablet is on the welcome screen”), and a goal state (“the patient information is saved and the receptionist sees the check-in confirmation”). Then list the correct actions a user would need to take, one row per click, tap, or interaction. The action list is the spine of the walkthrough — it forces the team to slow down at every step rather than skipping past the boring ones.
4. Assemble the cross-functional team and pick a facilitator
Pull together two to six reviewers from across the team. The minimum useful mix is one UX person, one product owner, and one engineer; for specialized domains add a subject-matter expert. Pick one facilitator (usually the UX researcher or the person who wrote the task) to drive the screen-by-screen walkthrough and one recorder to capture the answers. Brief everyone on the four cognitive walkthrough questions before the session starts and explain the persona out loud, so the whole group is anchored on the same imagined user.
5. Walk through the task one action at a time
The facilitator opens the prototype or build at the starting state and stops at every screen or step. For each correct action on the action list, the team works through the four cognitive walkthrough questions in order: will the user try to achieve this result, will they notice the correct action, will they associate that action with the result they want, and will they see progress after they take the action. The recorder writes down the group’s answer for each question and the rationale. Keep the discussion focused on the imagined user; if a reviewer slips into “I would just” or “any designer knows,” redirect them back to the persona.
6. Mark failure points and capture the rationale
For each step, end with a verdict: would the user succeed at this action, or would they fail or hesitate. If the answer is “fail” or “maybe,” log the step as a failure point and tag which of the four questions broke down. The four-question structure is what makes the method actionable, because each failure type points to a different fix: goal formation breakdowns need context or onboarding, visibility breakdowns need layout or affordance changes, labeling breakdowns need copy work, feedback breakdowns need new system messages or animations. Keep the rationale short but explicit, because the value of the workshop later is in the why behind each verdict, not just the count of failures.
7. Move on quickly when the answer is yes
Resist the urge to debate every screen the team likes. If the four questions all answer “yes” with strong agreement, mark the step as a pass and move on within a minute. The workshop’s productivity comes from spending most of the time on the screens where the answers are “no” or “maybe,” not from rehashing screens that already work. A common failure mode is spending the first hour over-discussing the welcome screen and running out of time before reaching the actual obstacles.
8. Cluster failure points and prioritize fixes
After the walkthrough, group the failure points by which of the four questions broke down and by which screen they live on. Score each failure point on three dimensions — severity (does the user fail completely or hesitate), frequency (how often will real users hit it), and effort to fix (small, medium, large). Pick the top five to ten failure points to push into the next design iteration, with concrete recommended fixes for each: rewritten copy, added onboarding, new affordance, additional feedback message. Tie each recommendation to the question it answers so the design team knows what success looks like after the fix.
9. Write the brief and feed the recommendations into the next iteration
Produce a five to ten page brief or short deck with the scope, the persona, the task list, the failure points organized by screen and by question, and the prioritized recommendations. Present it to the design lead and the product owner in person rather than emailing the document; the live discussion is where stakeholders agree on the trade-offs and commit to the changes. Schedule a follow-up walkthrough on the revised design once the recommendations ship, so the team can check that the fixes worked and the failure points actually went away.
How AI changes a cognitive walkthrough
AI compatibility: partial — Cognitive walkthrough is structurally a good fit for AI: the four questions are mechanical, the persona definition can be encoded as a system prompt, and a multimodal LLM can take a screenshot and answer “would a user know what to do here.” Recent experiments show that LLMs run a credible first pass on routine flows and catch many of the obvious learnability problems a human team would. The catch is that the value of a cognitive walkthrough comes from the cross-functional workshop discussion as much as from the four questions — the engineer pushing back on an unrealistic persona, the domain expert flagging an industry-specific gotcha, the product owner reframing what the goal actually is. AI can run the walkthrough; it cannot run the workshop.
What AI can do
- Simulate a first-time user against screenshots: Multimodal models like Claude, GPT-4o, and Gemini can take a screenshot, read a one-paragraph persona, and answer the four cognitive walkthrough questions for each correct action, producing a structured failure-point list in minutes. This is usually about 50–70% as thorough as a human team on routine flows and a strong starting point for the workshop.
- Generate the action sequence and the persona from a prototype: Given a Figma file, a video walkthrough, or a click-through prototype, an LLM can produce a draft task scenario, a draft persona, and a step-by-step action list — work that typically takes the lead researcher half a day before the workshop starts.
- Surface candidate failure points before the workshop: Running an LLM pre-pass on every screen produces a list of candidate concerns that the human team can confirm, override, or expand during the workshop. The team then spends its time on judgment, not on mechanical question-answering.
- Stress-test the persona at scale: A model can run the same task as five different personas (novice, returning user, low-literacy, non-native speaker, accessibility user) in parallel, surfacing failure points that are specific to one persona but invisible to another. Doing this manually would require five separate workshops.
- Cross-reference failures against accessibility (WCAG) at the same time: A model can apply the four cognitive walkthrough questions and the WCAG quick rules in the same pass, flagging steps that are both a learnability problem and an accessibility violation, and producing a combined risk list.
- Draft the readout brief: Given a coded log of failure points, an LLM can produce a first-draft brief organized by screen and by question type with annotated examples and recommended fixes. The human researcher rewrites it for tone and tightens the prioritization.
What requires a human researcher
- Defining the persona that matters: AI will accept any persona you give it, but choosing the persona that reflects the real customer base — the one whose activation rate the business actually cares about — is product judgment that depends on knowing the user research and the segmentation. Get this wrong and the walkthrough is technically correct but irrelevant.
- The cross-functional discussion: The largest single source of value in a cognitive walkthrough is the moment the engineer says “but the API will not return that field for new users” or the domain expert says “in our industry that label means the opposite.” AI cannot replace those interjections; they are the reason the workshop is cross-functional in the first place.
- Catching context-dependent failures: AI systematically misses obstacles that depend on the user’s emotional state, the surrounding workflow, the physical context (cold tablet at a clinic, distracted driver, frustrated customer), or domain conventions the model has never seen. Specialized B2B tools, regulated industries, and novel hardware are the hardest blind spots.
- Distinguishing real failures from intentional design tradeoffs: A model will flag every textbook learnability issue, including ones the team has deliberately accepted (a power-user shortcut, a deferred onboarding step, a delayed feedback message). A human knows when to defend the tradeoff and when to act on it.
- The decision to commit to fixes: Stakeholders agree to changes in the readout meeting, not in the document. AI can produce a credible draft, but the live conversation where design, product, and engineering negotiate effort, scope, and timeline is where the recommendations actually become work.
AI-enhanced workflow
Before AI, a cognitive walkthrough on a single critical task with a four-person workshop took about a day end to end: a few hours of prep (persona, task, action sequence), a two- or three-hour workshop, and half a day to write the brief. The bottleneck was often the prep work — writing the action sequence by hand against the prototype, drafting the persona from memory, and pulling screenshots one by one.
With AI in the workflow, the same project compresses to a half-day. The lead researcher feeds the prototype (Figma file or click-through link) to a multimodal model with a custom prompt that produces a draft action sequence, a draft persona, and a first-pass set of answers to the four questions for every step in minutes. The team then runs the workshop with that draft as the starting document — confirming the easy answers, overriding the wrong ones, and spending the bulk of the session on the screens where the model and the humans disagree. After the workshop, the same model drafts the readout brief, and the researcher rewrites it for tone and prioritization.
The catch is the same as for AI-assisted heuristic evaluation: the time saved depends on a real human verification pass on the AI’s output, and the workshop discussion is non-negotiable. Studies of AI-only walkthroughs find that models cluster the obvious learnability problems reliably but quietly drop the rare and context-dependent ones — exactly the failures that often matter most. Researchers who get the most value from AI here treat it as a thorough but naive intern: useful for the prep work and the first pass, never trusted as the final answer, always paired with a human team that knows the product and the user.
Example from practice
A regional health system rolled out a new tablet-based check-in app for clinics, replacing paper forms. The product manager wanted to know whether brand-new patients (especially older patients with little prior experience using kiosks) could complete the check-in unaided, before scheduling a costly multi-clinic field test with real patients. Recruiting and running a moderated usability study would have taken three weeks; the team had one week to get a defensible answer.
The lead researcher ran a cognitive walkthrough in two days. She wrote a one-page persona for “Marina, 67, returning to the clinic after a six-month gap, mild arthritis, has used a smartphone for messaging but never an app like this before, anxious about the visit,” picked the patient check-in task, and produced a 14-step action sequence from the welcome screen through the final confirmation. She pulled in a clinic operations lead, a UX designer, and a backend engineer for a two-and-a-half hour workshop and ran the four questions on every step, with a recorder capturing the verdicts. She also fed the screenshots into Claude with a custom prompt that ran the same four questions against the same persona, and used the model’s output as a fourth voice — accepting some findings, overriding the ones where the model misread the clinic context.
The walkthrough surfaced 11 failure points, eight of which clustered around two screens: the welcome screen had a “New Patient” button in the bottom right corner that Marina would not parse as an action because the persona had never seen a tile-based selector on a tablet, and the third screen asked for the date of last visit using a date picker that defaulted to the current month rather than scrolling backward. Both were “action visibility” failures in the four-question taxonomy. The team rewrote the welcome screen as a single “Are you a new or returning patient?” question with two large buttons and replaced the date picker with a “Last visit was: 6 months ago / 1 year ago / more than a year / I do not remember” choice list. The fixes shipped in the next sprint, the field test the following month found that 94% of new patients aged 60+ completed the check-in unaided versus an estimated 60% on the original design, and the walkthrough cost about 14 hours of researcher time across the team — versus the 60+ hours a comparable usability test would have required.
AI prompts for this method
4 ready-to-use AI prompts with placeholders — copy-paste and fill in with your context. See all prompts for cognitive walkthrough →.