Skip to content
Prompt

AI prompts for unmoderated usability testing: task writing, results analysis, and variant comparison

Ready-to-use AI prompts for unmoderated usability testing — write test tasks, analyze completion rates and click paths, and compare design variants with statistical rigor.

How to use

Copy and paste into your AI assistant chat

These prompts cover the stages of unmoderated usability testing where AI saves the most time: writing clear test tasks, analyzing quantitative results, and comparing design variants.

Write unmoderated usability test tasks from product goals

I am setting up an unmoderated usability test for [product description]. The test will run on [Maze / UserTesting / Lyssna / other tool] with [N] participants.

Product goals being tested:
[list 3-5 product goals, e.g., "Users can find and apply a discount code during checkout"]

For each product goal, write a task scenario that:
1. Describes a realistic situation in 1-2 sentences (no interface terminology)
2. Gives enough context for the participant to understand what they need to do WITHOUT a facilitator
3. Has an unambiguous success state that the testing tool can detect (e.g., "reaches the confirmation screen")
4. Can be completed in 2-4 minutes by a user who has never seen the product

Also provide:
- A post-task question for each task (beyond the SEQ — something specific to the goal being tested)
- A list of potential misunderstandings in the task wording and how to prevent them
- An estimated total study duration (sum of all tasks + questionnaires)

Keep the total study under 15 minutes to minimize dropout.

Analyze unmoderated usability test results

Here are the results from an unmoderated usability test with [N] participants on [product description].

Task results:
[paste task-level data: completion rate, median time-on-task, SEQ scores, common failure points]

Open-ended responses to "What was the most confusing part of this experience?":
[paste all responses]

Post-study SUS scores:
[paste scores or average]

Analyze the data and produce:

1. TASK PERFORMANCE SUMMARY: Table with each task, completion rate, median time, SEQ average, and a traffic-light rating (green = meets benchmark, yellow = borderline, red = below benchmark). Use these benchmarks: completion > 78% = green, 60-78% = yellow, < 60% = red.

2. PROBLEM AREAS: For each task with a yellow or red rating, describe the likely problem based on the available data (completion rate, time, SEQ mismatch patterns, open-ended responses).

3. OPEN-ENDED THEMES: Code all open-ended responses into 3-5 themes with frequency counts and representative quotes.

4. SUS INTERPRETATION: Convert the SUS score to a letter grade (A-F) and percentile rank. Compare to industry benchmarks if available.

5. PRIORITIZED RECOMMENDATIONS: Top 3-5 actions the team should take, ranked by impact. For each, state: what to change, which task it affects, and what metric should improve.

Compare two design variants from A/B usability test data

I ran an unmoderated usability test comparing two design variants:
- Variant A: [brief description]
- Variant B: [brief description]

Each variant was tested with [N] participants on the same [M] tasks.

Results:
Variant A: [paste completion rates, time-on-task, SEQ scores per task]
Variant B: [paste completion rates, time-on-task, SEQ scores per task]

Compare the variants:

1. TASK-BY-TASK COMPARISON: Table showing both variants' metrics side by side for each task. Flag statistically significant differences (use a threshold of p < 0.05 if raw data is available, or note where the gap exceeds 15 percentage points as practically significant).

2. OVERALL WINNER: Which variant performs better overall? Is the advantage consistent across all tasks or driven by one task?

3. TRADE-OFFS: Are there tasks where the losing variant actually performs better? If so, what element of that variant might be worth preserving?

4. RECOMMENDATION: Proceed with Variant [X], with the following modifications based on where Variant [Y] performed better: [specific changes]. If the data is inconclusive, recommend what additional testing would resolve the question.