NPS, CSAT, and SUS: how to choose and use standardized UX metrics

What are standardized UX metrics?

NPS, CSAT, and SUS are standardized questionnaire instruments that reduce user experience and customer satisfaction to a single comparable number. Net Promoter Score (NPS) measures loyalty and advocacy through a single “likelihood to recommend” question on a 0-10 scale. Customer Satisfaction Score (CSAT) captures satisfaction with a specific interaction or touchpoint, typically on a 1-5 scale. System Usability Scale (SUS) evaluates perceived usability through a 10-item questionnaire that produces a score from 0 to 100. Each instrument answers a different question, measures a different dimension, and fits a different point in the user journey — but all three share a common purpose: turning subjective experience into a trackable number that teams can benchmark over time, compare across products, and report to stakeholders.

What question does it answer?

How loyal are our users, and how likely are they to recommend the product to others? (NPS)
How satisfied were users with a specific interaction, transaction, or support experience? (CSAT)
How usable do users perceive the product to be overall? (SUS)
How much effort did users need to complete a task or resolve an issue? (CES)
How have these scores changed since the last measurement?
How do our scores compare to industry averages or direct competitors?

When to use

When the organization needs a simple, repeatable number to track user experience quality over time.
When stakeholders need a metric they can understand without UX expertise — NPS, CSAT, and SUS scores fit executive dashboards and quarterly reviews.
When the team wants to compare its product against industry benchmarks — all three have published norms.
When measuring the impact of a specific change — deploy CSAT after a redesigned flow, or run SUS before and after a major overhaul.
When embedding a quick feedback mechanism at key journey moments without requiring a full survey.
When the research budget does not allow for a full benchmarking study and a lightweight pulse check is the practical alternative.

Not the right method when the team needs to understand why users are dissatisfied — a CSAT score of 3.2 tells you there is a problem but says nothing about what the problem is. Always pair standardized scores with follow-up questions or interviews. These instruments are also insufficient when the team needs behavioral data: self-reported satisfaction and actual usability are different constructs. NPS in particular has limitations: it does not predict company growth as reliably as originally claimed, suffers from cultural response bias, and its arbitrary score grouping discards nuance.

What you get (deliverables)

A single score per instrument per measurement period.
Trend chart showing score changes across periods.
Segment breakdowns by user type, plan tier, device, geography, or tenure.
Coded follow-up responses grouped into themes with frequency counts.
Benchmark comparison against published industry averages.
Action-linked report connecting scores to recommended next steps.

Participants and duration

Respondents: Target the entire active user base or a random sample. Minimum 50 responses for stable NPS; 200+ for segment analysis. SUS needs at least 12-14 respondents.
Survey length: NPS and CSAT take under 1 minute. SUS takes 2-3 minutes.
Setup time: 1-3 days.
Field time: Continuous or 5-14 days per campaign.
Analysis time: 1-2 days.

How to use standardized UX metrics (step-by-step)

1. Choose the right instrument for the right question

NPS measures loyalty — use it for quarterly relationship tracking. CSAT measures satisfaction with a specific interaction — deploy it right after a transaction or support ticket. SUS measures perceived usability — use it after usability tests or as periodic product health checks. CES measures effort — deploy it after task completion. Match the instrument to what you need to know.

2. Define the trigger and timing

NPS: after the user has enough experience to form a loyalty opinion. CSAT: immediately after the interaction. SUS: at the end of a test session or periodically. CES: right after task completion. Triggering at the wrong moment produces meaningless data.

3. Use exact standardized wording

For NPS: “How likely are you to recommend [product] to a friend or colleague?” (0-10), followed by an open-ended “why.” For CSAT: “How satisfied are you with [interaction]?” (1-5). For SUS: all 10 original items with alternating positive/negative wording. Do not modify SUS items — this invalidates published norms.

4. Prevent survey fatigue

NPS: max once per quarter per user. CSAT: max once per 30 days per interaction type. Randomize which users see the survey.

5. Calculate scores correctly

NPS: Promoters (9-10) minus Detractors (0-6) as percentages. Range: -100 to +100. CSAT: (Satisfied responses / total) × 100. Satisfied = top ratings (4-5 on a 5-point scale). SUS: Odd items: rating minus 1. Even items: 5 minus rating. Sum all × 2.5. Range: 0-100. Average: ~68.

6. Code the follow-up responses

The score without the “why” is a thermometer without a diagnosis. Code open-ended responses by theme, separately for each score group (Promoters vs. Detractors for NPS, satisfied vs. dissatisfied for CSAT).

7. Segment and benchmark

Break scores by segment. Compare to published benchmarks. Look for segments that drag the average down.

8. Report with context

Present the score, the trend, the segment breakdown, the top themes, and the recommended actions. Use “What, So What, Now What” for each finding.

How AI changes this method

AI compatibility: partial — AI automates follow-up response coding, trend monitoring, and report generation, but cannot replace decisions about which instruments to use or how to interpret scores in business context.

What AI can do

Code open-ended follow-ups at scale: AI categorizes thousands of responses into themes with sentiment and frequency counts.
Detect score trends and anomalies: AI monitoring alerts the team when scores drop below thresholds or diverge by segment.
Generate periodic reports: Given raw data, an LLM can calculate scores, segment them, and draft a stakeholder report.
Correlate scores with product events: AI cross-references score drops with release logs or support ticket spikes.

What requires a human researcher

Choosing the right instrument: Requires understanding product strategy and stakeholder questions.
Interpreting scores in context: An NPS of 35 is excellent for utilities and mediocre for consumer apps.
Managing cultural bias: NPS varies systematically across cultures and needs regional benchmarks.
Designing the action plan: Deciding where to invest based on the data involves business trade-offs.

AI-enhanced workflow

The biggest bottleneck in standardized metrics programs was coding the open-ended follow-ups. A quarterly NPS campaign might generate 3,000 text responses. With AI coding tools, the researcher receives a coded dataset within an hour instead of spending days on manual tagging. The feedback loop from data collection to actionable insight shrinks from weeks to days.

AI dashboards also replace the manual monthly export-and-analyze cycle with real-time monitoring that surfaces problems as they emerge.

Tools

Survey deployment: Qualtrics, Delighted, Satmetrix, Medallia, Survicate, Hotjar, Typeform.

In-product triggering: Pendo, Userpilot, Sprig, Appcues, Intercom Surveys.

SUS administration: Any survey tool (SUS is public domain). MeasuringU SUS Calculator.

AI analysis: Chattermill, Keatext, MonkeyLearn, ChatGPT / Claude.

Benchmarks: Satmetrix/Bain NPS by industry, MeasuringU SUS norms, Zendesk CSAT, ACSI.

Dashboards: Looker Studio, Tableau, Power BI.

Works well with

Survey (Sv): NPS, CSAT, and SUS are often embedded within larger surveys that provide context making the score interpretable.
In-depth Interview (Di): When NPS drops, interviews with detractors reveal why.
Usability Testing Moderated (Ut): SUS after a usability session adds a quantitative layer to qualitative observations.
Benchmarking (Bm): Standardized scores are core metrics in benchmarking programs.
Analytics Review (An): Analytics data combined with satisfaction scores shows both what users do and how they feel.

Example from practice

A fintech company running a personal finance app had tracked NPS quarterly for two years, scoring between 38 and 45. After a major redesign, NPS dropped to 28. AI coding of 1,200 follow-up responses revealed two dominant Detractor themes: “I can’t find features that moved” (34%) and “the budgeting tool is confusing” (28%). Promoters praised “the app looks much better” (41%). Segment analysis showed the drop concentrated among users with accounts older than one year; new users gave an NPS of 52.

The team added a “what’s changed” walkthrough for returning users and simplified the budgeting feature. Within two months, NPS recovered to 40. The budgeting feature’s CSAT rose from 3.1 to 4.0 on a 5-point scale, confirming the fix worked at both relationship and interaction levels.

Beginner mistakes

Using NPS as the only UX metric

NPS measures loyalty, which correlates with but is not the same as usability. Use it alongside behavioral metrics and SUS for a complete picture.

Deploying the wrong instrument at the wrong moment

NPS on first login is meaningless. CSAT three weeks after an interaction captures a faded memory. Match instrument to moment.

Modifying the SUS questionnaire

Changing SUS wording invalidates its norms. Use UMUX-Lite (a validated 2-item alternative) if the full SUS does not fit.

Ignoring the follow-up responses

The score is a thermometer; the follow-ups are the diagnosis. Always code and analyze them.

Not accounting for cultural response bias

NPS scores are systematically lower in some regions. Compare each region against its own trend or use region-specific benchmarks.

AI prompts for this method

3 ready-to-use AI prompts with placeholders — copy-paste and fill in with your context. See all prompts for NPS, CSAT, and SUS →.