Error analysis checklist: scope, coding, taxonomy, prioritization
This error analysis checklist covers the full project from scope definition to the prioritized action plan. Use it as a working document — copy it into your project notes, tick off each item as you go, and add comments where the project deviates from the standard flow. The checklist assumes a focused flow with fifty to a few hundred observations from one or more sources (moderated test notes, session recordings, support tickets, in-app feedback, error logs); for larger or multi-flow analyses the same items still apply, but plan more time for the coding passes and the taxonomy build.
Before
- Write the scope on one line: flow, task, time window, user segment
- Pick the data source(s): moderated test notes, session recordings, support tickets, in-app feedback, error logs
- Confirm the sample size is large enough: 5–12 sessions for moderated, 50–200 observations for logs or tickets
- Pull the sample into one place (folder, annotation queue, spreadsheet) so every coding pass works from the same set
- Decide on the severity rubric (cosmetic / minor / major / critical, or Nielsen 0–4) and write the one-sentence test for each level
- Choose the coding tool (Dovetail, Marvin, Notably, Google Sheets, Airtable, or a dedicated LLM-eval platform like Langfuse for AI features)
- Schedule the readout meeting in advance so there is a deadline for the analysis
Execution
- Run the open-coding pass: walk through every observation and write a free-text label for the first observable failure
- Stay descriptive on the open pass — do not slot observations into pre-existing categories
- Cluster the free-text labels into 6–10 named failure categories with one-line definitions each
- Confirm each category is grounded in 3+ observations; demote one-off labels to outliers
- Run the structured coding pass: assign every observation a category, severity, and segment tag
- Capture for each observation: where (screen/step), what (observable behavior), category, severity, segment, evidence (quote or screenshot)
- Spot-check 10–20% of AI-generated labels manually if AI was used in the coding pass
- Read every observation marked “other” or low-confidence before locking the taxonomy
- Tally the count of distinct users affected per failure mode (not the count of events)
- Build the frequency table and the severity matrix
After
- Diagnose root causes for the top 3–7 failure modes (design / mental model / content / technical bug)
- Quote direct evidence for each diagnosis so the design and engineering leads can verify
- Score each high-priority failure on severity, frequency, and business impact
- Pick the top 5–10 failures for the action plan with concrete recommended fixes and effort estimates
- Flag any failure that scored low but is strategically important (legal exposure, brand trust, high-value segment)
- Identify clusters of 3+ failures in the same category that may deserve a single redesign
- Write the 5–10 page brief: scope, headline finding, method, top failure modes with evidence, action plan, deferred backlog
- Present to product, design, and engineering leads in person; do not email the spreadsheet
- Archive the coded log, the taxonomy, and the severity rubric so the next analysis can be compared against this baseline
- Schedule a follow-up sample after the fixes ship to confirm the failure rate dropped and no new modes appeared