These four prompts cover the parts of an error analysis where AI saves the most time without giving up rigor: clustering raw error observations into a failure taxonomy, running the structured coding pass at scale, diagnosing root causes for the top failures, and building the prioritized action plan. Each prompt is meant to be filled in with your own product, scope, and observations, then run in Claude, ChatGPT, Gemini, or any LLM with a long enough context window. The prompts assume a human researcher in the loop — the model produces a first draft, the researcher reads, edits, and owns the final call.
Prompt 1: Cluster raw error observations into a failure taxonomy
You are an experienced UX researcher building a failure taxonomy from raw error observations.
Product: [product type, e.g., consumer fintech app, B2B analytics dashboard]
Flow being analyzed: [describe the flow and the task users were trying to complete]
User segment: [novice, expert, specific role, device]
Data source: [moderated usability test notes, unmoderated session self-reports, support tickets, in-app feedback, error logs]
Below is the raw set of error observations from the sample. Each line is one observation in free text.
[paste observations, one per line]
Please:
1. Read every observation carefully. Do not skim.
2. Group observations that describe the same underlying failure into a single cluster.
3. Propose 6-10 named failure categories that cover the data, with a one-line definition for each. Categories should be mutually exclusive and grounded in the actual observations, not invented.
4. For each category, list the count of observations and 3-5 representative verbatim quotes.
5. Flag any observation that does not fit cleanly into a category as an outlier and explain why.
6. End with a short paragraph on which failure modes appear to be the most frequent and which appear to be the most severe based on the language used in the observations.
Do not invent failure modes that the data does not support. If you only see one example of something, it is an outlier, not a category.
Prompt 2: Run structured coding against an existing taxonomy
I have a taxonomy of failure modes for [flow] and a set of error observations to label against it. Apply the taxonomy consistently and flag anything that does not fit.
Taxonomy:
[paste each category name with a one-line definition]
Severity rubric:
- Critical: blocks task completion or causes irreversible harm
- Major: significant friction, likely to cause errors or hesitation, task may still succeed
- Minor: noticeable friction, task succeeds without help
- Cosmetic: visual or wording polish, no impact on task
Observations to label:
[paste each observation with an ID]
For each observation, return a row with:
- ID
- Category (from the taxonomy, exactly one)
- Severity (from the rubric)
- Confidence (high / medium / low — how certain you are about the category)
- One-sentence justification quoting the relevant part of the observation
- Flag: "fits cleanly" / "edge case" / "does not fit"
At the end:
1. List every observation flagged as "edge case" or "does not fit" and propose whether the taxonomy should be extended or whether the observation belongs under "other"
2. Report the count per category and per severity level
3. Highlight any category where you assigned low confidence to more than 30% of the observations — that suggests the category definition is too vague
Prompt 3: Diagnose root causes and propose fixes
You are helping me write the diagnosis section of an error analysis report. For each high-priority failure mode below, propose the most likely root cause and a concrete recommended fix.
Product context: [product description, user type, business model]
Flow: [flow description with key steps]
Failure modes (each with frequency and severity):
[paste each failure mode with the count of distinct users affected, severity, and 2-3 verbatim observations]
For each failure mode:
1. Diagnose the most likely root cause and pick exactly one of: design problem, mental-model mismatch, content/terminology, technical bug, content gap, other (explain). Quote the strongest piece of evidence from the observations to support the diagnosis.
2. List 2-3 alternative root causes you considered and explain in one sentence why you picked the one above each.
3. Propose a concrete recommended fix in one paragraph. State the specific change to make, which team should own it (design, content, engineering), and a rough effort estimate (small, medium, large).
4. Identify any failure mode where the evidence is too thin to diagnose confidently and recommend additional research that would resolve the ambiguity.
5. Flag any failure mode that may be a symptom of a deeper systemic problem rather than a stand-alone bug.
End with a 3-sentence summary of the overall pattern: which root-cause type appears most often across the high-priority failures, and what does that suggest about where the team should invest next.
Prompt 4: Build the prioritized action plan
I have a coded error analysis backlog of [N] failure modes for [flow] in [product]. The team has roughly [engineering capacity, e.g., one designer-week and three engineering-weeks] for the next sprint.
Backlog:
[paste each failure mode with category, severity, frequency (number of distinct users hit), business context, and recommended fix]
Please:
1. Score each failure mode on three dimensions (1-5 each):
- Severity: how badly it hurts the user when it happens
- Frequency: how often a real user hits it (use the number of distinct users)
- Business impact: effect on activation, retention, conversion, or support cost
Briefly justify each score in one sentence.
2. Compute a priority score (sum of the three dimensions) and sort the backlog from highest to lowest priority.
3. Recommend the top 5-10 failure modes to fix in the next sprint based on the priority scores and the available capacity, with a rough effort estimate per fix (small, medium, large).
4. Identify any failure mode that scores low on this rubric but feels strategically important (e.g., a failure with legal exposure, a brand-trust issue, a failure that hits a high-value customer segment) and explain why it should be elevated.
5. Suggest 2-3 failures that should be tracked but deferred, with a rationale for not fixing them now.
6. Identify any cluster of 3+ failures in the same category — these may deserve a single redesign rather than incremental patches.
7. Draft a 5-sentence executive summary the lead can use as the opening of the readout brief.