Skip to content

Error analysis checklist: scope, coding, taxonomy, prioritization

This error analysis checklist covers the full project from scope definition to the prioritized action plan. Use it as a working document — copy it into your project notes, tick off each item as you go, and add comments where the project deviates from the standard flow. The checklist assumes a focused flow with fifty to a few hundred observations from one or more sources (moderated test notes, session recordings, support tickets, in-app feedback, error logs); for larger or multi-flow analyses the same items still apply, but plan more time for the coding passes and the taxonomy build.

Before

  • Write the scope on one line: flow, task, time window, user segment
  • Pick the data source(s): moderated test notes, session recordings, support tickets, in-app feedback, error logs
  • Confirm the sample size is large enough: 5–12 sessions for moderated, 50–200 observations for logs or tickets
  • Pull the sample into one place (folder, annotation queue, spreadsheet) so every coding pass works from the same set
  • Decide on the severity rubric (cosmetic / minor / major / critical, or Nielsen 0–4) and write the one-sentence test for each level
  • Choose the coding tool (Dovetail, Marvin, Notably, Google Sheets, Airtable, or a dedicated LLM-eval platform like Langfuse for AI features)
  • Schedule the readout meeting in advance so there is a deadline for the analysis

Execution

  • Run the open-coding pass: walk through every observation and write a free-text label for the first observable failure
  • Stay descriptive on the open pass — do not slot observations into pre-existing categories
  • Cluster the free-text labels into 6–10 named failure categories with one-line definitions each
  • Confirm each category is grounded in 3+ observations; demote one-off labels to outliers
  • Run the structured coding pass: assign every observation a category, severity, and segment tag
  • Capture for each observation: where (screen/step), what (observable behavior), category, severity, segment, evidence (quote or screenshot)
  • Spot-check 10–20% of AI-generated labels manually if AI was used in the coding pass
  • Read every observation marked “other” or low-confidence before locking the taxonomy
  • Tally the count of distinct users affected per failure mode (not the count of events)
  • Build the frequency table and the severity matrix

After

  • Diagnose root causes for the top 3–7 failure modes (design / mental model / content / technical bug)
  • Quote direct evidence for each diagnosis so the design and engineering leads can verify
  • Score each high-priority failure on severity, frequency, and business impact
  • Pick the top 5–10 failures for the action plan with concrete recommended fixes and effort estimates
  • Flag any failure that scored low but is strategically important (legal exposure, brand trust, high-value segment)
  • Identify clusters of 3+ failures in the same category that may deserve a single redesign
  • Write the 5–10 page brief: scope, headline finding, method, top failure modes with evidence, action plan, deferred backlog
  • Present to product, design, and engineering leads in person; do not email the spreadsheet
  • Archive the coded log, the taxonomy, and the severity rubric so the next analysis can be compared against this baseline
  • Schedule a follow-up sample after the fixes ship to confirm the failure rate dropped and no new modes appeared