AI prompts for heuristic evaluation: screenshot review, checklists, consolidation

These four prompts cover the parts of a heuristic evaluation where AI saves the most time without giving up rigor: running a first-pass review of screenshots against Nielsen’s heuristics, building a checklist tailored to a specific flow, consolidating findings across multiple evaluators, and turning the merged backlog into a prioritized action plan. Each prompt is meant to be filled in with your own product, scope, and screenshots, then run in Claude, ChatGPT, Gemini, or any multimodal LLM with a long enough context window.

Prompt 1: Run a multimodal heuristic evaluation on screenshots

You are an experienced UX researcher running a heuristic evaluation on a [product type — e.g., consumer fintech app, B2B analytics dashboard] for [specific user group — e.g., novice users completing checkout for the first time].

Scope: [describe the flow being evaluated and the screens included]
User type: [novice / intermediate / power user]
Device and platform: [mobile iOS / desktop web / etc.]

Heuristics to apply: Jakob Nielsen's 10 usability heuristics
1. Visibility of system status
2. Match between system and the real world
3. User control and freedom
4. Consistency and standards
5. Error prevention
6. Recognition rather than recall
7. Flexibility and efficiency of use
8. Aesthetic and minimalist design
9. Help users recognize, diagnose, and recover from errors
10. Help and documentation

I will paste/attach screenshots of each screen in order.

For every observed issue, return one row in this format:
- Screen: [name or number]
- Element: [the specific element or behavior]
- Heuristic violated: [primary, optionally secondary]
- What goes wrong: [observable behavior, not opinion]
- Why it matters: [impact on task completion or trust]
- Severity: [cosmetic / minor / major / critical]
- Recommended fix: [concrete change]

Skip issues that are intentional tradeoffs given the user type or device. If you are unsure, flag the issue and explain the tradeoff so the human evaluator can decide.

Prompt 2: Build a heuristic checklist tailored to a specific flow

You are helping me prepare a heuristic evaluation. I need a checklist of yes/no questions tailored to my specific flow, organized by Nielsen's 10 heuristics.

Product: [product description]
Flow being evaluated: [describe each step of the flow]
Target user: [novice, expert, role]
Key tasks: [the 1-3 tasks the user needs to complete]
Device: [platform]

Please:
1. For each of Nielsen's 10 heuristics, write 4-6 yes/no questions specific to this flow (not generic). Each question should check whether the design satisfies that heuristic in the context of these specific tasks.
2. For each question, give one example of what a violation would look like in this specific product so the evaluator knows what they are looking for.
3. Add 2-3 extra questions per heuristic that cover edge cases relevant to this user type — e.g., what happens if the user is offline, if they tap back, if a field is empty, if they hit an error.
4. Flag any heuristic that is unlikely to apply to this specific flow and explain why.
5. End with a 5-question checklist for accessibility (WCAG 2.2 quick check) that the evaluator should also run.

Prompt 3: Consolidate findings from multiple evaluators

I am consolidating the output of [N] evaluators who ran an independent heuristic evaluation on the same flow. I will paste each evaluator's list below. The format per row is: screen, element, heuristic, what goes wrong, severity, recommended fix.

Evaluator A:
[paste list]

Evaluator B:
[paste list]

Evaluator C:
[paste list]

Please:
1. Identify duplicate findings (same screen + same element + same heuristic) and merge them into one canonical issue, preserving the strongest description and the highest severity rating
2. Identify near-duplicates (same screen, same heuristic, slightly different framing) and propose how to merge them
3. Flag findings where evaluators disagreed on severity and propose a final rating with one-sentence justification
4. Identify any issue that only one evaluator caught and assess whether it is a real edge case or a likely false positive
5. Cluster the merged issues into 3-7 thematic groups (navigation, feedback, errors, terminology, etc.) and propose a name for each cluster
6. Sort the final consolidated list by severity within each cluster
7. Suggest 5-10 issues that should be the headline findings in the readout brief based on severity, frequency across evaluators, and likely business impact

Prompt 4: Draft the action plan and prioritization

I have a consolidated heuristic evaluation backlog of [N] issues across [M] thematic clusters. The product is [product description] and the flow being evaluated was [flow description]. The team has roughly [engineering capacity, e.g., one designer-week and three engineering-weeks] available to fix issues in the next sprint.

Here is the consolidated backlog:
[Paste the consolidated list with cluster, severity, screen, issue, recommended fix]

Please:
1. Score each issue on three dimensions (1-5 each): severity, estimated frequency (how often a real user hits it), business impact (effect on activation, retention, conversion, or support cost). Briefly justify each score.
2. Compute a priority score (sum of the three dimensions) and sort the backlog from highest to lowest priority
3. Recommend the top 5-10 issues to fix in the next sprint based on the priority scores and the available capacity, with a rough effort estimate per issue (small, medium, large)
4. Identify any cluster that contains 3+ medium or major issues — these are systemic problems that may deserve a single redesign rather than incremental patches
5. Flag any issue that scored low on this rubric but feels strategically important (e.g., an accessibility issue with legal exposure, a brand-trust issue) and explain why
6. Suggest 2-3 issues that should be tracked but deferred, with a rationale for not fixing them now
7. Draft a 5-sentence executive summary the lead can use as the opening of the readout brief