Product-market fit: a 10-stage methodology for finding it
Product-market fit is the moment when the product, the audience, and the moment in the market line up well enough that growth stops being a slog. The phrase is famous, but the concept is slippery: most teams cannot tell whether they have it, and most attempts to measure it produce numbers that move with whoever is asking. The result is a long stretch of “are we there yet” — sometimes years.
This guide describes a structured cycle for finding PMF: 10 stages, 7 dimensions, and a small set of measurement instruments at the end. It draws on Marty Cagan’s product discovery work, David Bland’s Testing Business Ideas, Hamilton Helmer’s 7 Powers, Sean Ellis’s 40% benchmark, First Round Capital’s Levels of PMF framework, and Bill Gross’s research on timing as the dominant factor in startup outcomes.
Key insight
A PMF cycle takes months, and it comes back to the same questions in different forms — first as guesses, then as research findings, then as user quotes, then as launch metrics. Each pass refines the picture.
Why a structured cycle helps
Most PMF advice falls into one of two traps. The first is the single metric trap: “you’ll know when you have it” — useful as folklore, useless as a process. The second is the survey trap: “run Sean Ellis and check the number” — but Sean Ellis on a random audience gives a random result, and a 47% score on a list of friendly users tells you nothing.
A structured cycle solves both problems by separating the work into stages where each stage answers one question, leaves an artifact, and feeds the next. The artifacts are the state. If you stop for two months and come back, the folder of artifacts tells you exactly where you are.
The cycle is also designed to let confidence drop. If new data contradicts the hypothesis, the score for that dimension goes down, and that is the cycle working as intended. Self-driven PMF work fails most often because teams inflate their numbers to feel better. The structure pushes against this by versioning every narrative and forcing the explicit question “what changed since last time, and why.”
The 7 dimensions
PMF is the alignment of seven variables, all of which have to hold at once. Six of these come from Cagan’s product discovery work; the seventh, Timing, comes from Bill Gross’s analysis of 200+ startups, where timing turned out to be a stronger predictor of outcome than team, idea, business model or funding.
| # | Dimension | The question | What good looks like |
|---|---|---|---|
| 1 | Problem to solve | What outcome are users trying to achieve, and what blocks them? | Concrete, independent of any product, with time and frequency markers |
| 2 | Target audience | Who exactly, and why them now (vs who later)? | 2-3 behavioral attributes, not demographics; “Now” segment separated from “Future” segments |
| 3 | Value proposition | What single benefit, in their words, hits hardest? | Benefit-led tagline plus 3-5 sub-benefits, all measurable in user time/money/effort |
| 4 | Competitive advantage | Which of the 7 Powers will be your long-term moat? | One of Helmer’s seven, with a concrete compounding mechanism |
| 5 | Growth strategy | How do you get the first 1K, and how do you get to 100K? | Two different channels for the two horizons, with cost and CAC for each |
| 6 | Business model | What is the equation, and is it consistent with audience and channel? | Concrete pricing + LTV/CAC estimates + path to profitability |
| 7 | Timing — why now | What concrete shift in the last 1-3 years made this possible or needed? | A specific triggering event in technology, behavior, regulation, or economics |
The dimensions are interdependent. Strong hypotheses show how a particular audience has a particular pain that a particular value prop addresses through a channel that fits a business model that compounds through one of the seven powers. Weak hypotheses are the ones where the dimensions sit in parallel and never connect.
The 10-stage cycle
The cycle takes one product through ten stages, end to end. Stages 6 and 8 happen outside any tool — they are the user’s own field work and product launch. The other eight produce artifacts that build on each other.
| # | Stage | Goal | Artifact |
|---|---|---|---|
| 0 | Setup | Capture product context and team pre-flight before the hypothesis | 00_setup.md |
| 1 | Hypothesis | Write the first version of the narrative across all 7 dimensions | narrative-v1.md |
| 2 | Market research | Find analogs (successes) and antilogs (failures) per dimension | market-research.md |
| 3 | Synthesis | Risk-score every dimension; identify the riskiest; produce v2 narrative | risk-prioritization.md, narrative-v2.md |
| 4 | DVF validation | Decompose the riskiest dimension into 9 testable assumptions; design an experiment | assumptions-map.md, experiment-brief.md |
| 5 | Interview prep | Build the field guide for 15+ in-depth interviews | interview-guide.md |
| 6 | Field interviews | (Outside any tool — the user runs the interviews) | interviews/notes/*.md |
| 7 | Interview synthesis | Read every note in isolation, extract patterns, write v3 narrative | interview-synthesis.md, narrative-v3.md |
| 8 | MVP launch | (Outside any tool — the user launches the product) | — |
| 9 | Metrics | Set up Sean Ellis + retention cohorts + Levels of PMF assessment | metrics-dashboard.md |
| 10 | Iterate | Decide: continue, iterate, or pivot, based on the metrics | iteration-changelog.md |
Each stage reads what the previous stages wrote and adds its own. The narrative is versioned at three points — v1 after Stage 1, v2 after Stage 3, v3 after Stage 7 — and each version is its own file with a Version History changelog. Diffing v1 against v3 is often the single most useful artifact of the cycle: it shows what the team learned and where the original idea bent under contact with reality.
Risk scoring and the riskiest dimension
After market research, every dimension gets a numerical risk score. The formula is simple:
Risk Score = (10 - Evidence Score) × Failure Impact
Evidence Score is how strongly the data supports the dimension on a scale of 1-10. Failure Impact is how catastrophic it would be if the dimension turned out to be wrong, on a scale of 1-4. The defaults are based on how recoverable each dimension is:
| Dimension | Default impact | Reasoning |
|---|---|---|
| Problem to solve | 4 (Critical) | If there is no problem, there is no product. Cannot recover. |
| Business model | 4 (Critical) | If economics do not add up, the company dies. Hard to recover. |
| Target audience | 3 (High) | Repositioning is possible, but expensive. |
| Growth strategy | 3 (High) | Channels can be changed, but time is lost. |
| Timing | 3 (High) | If you are late, you are late. If you are early, you need stamina. |
| Value proposition | 2 (Medium) | Messaging can be rewritten iteratively. |
| Competitive advantage | 2 (Medium) | A moat is built over years. Matters long-term, does not kill at the start. |
The riskiest dimension is the one with the highest score. It becomes the focus of Stage 4. Synthesis also runs two cross-fit checks: Channel-Model fit (do your growth channels work with your business model?) and Model-Market fit (does your business model work for your target audience?). Cross-fit conflicts often hide fatal problems that scoring alone misses — the classic example is enterprise sales paired with freemium pricing, where each piece looks fine in isolation but the combination is impossible.
DVF: turning a dimension into testable assumptions
Stage 4 takes the riskiest dimension and decomposes it into nine testable assumptions across David Bland’s DVF framework: Desirability × Viability × Feasibility, three assumptions per category.
- Desirability assumptions are about user needs only. No money, no technology. “I believe that small e-commerce sellers spend 4-6 hours a week on manual order syncing.”
- Viability assumptions are about money only. Pricing, conversion, CAC, LTV, unit economics. “I believe these sellers will pay $29/month for a tool that saves 4 hours.”
- Feasibility assumptions are about whether you can actually build, support, and operate the product. Operational + technical + regulatory.
The 9 assumptions get placed on a 2×2 matrix of Importance × Evidence. The Critical quadrant — high importance, weak evidence — is where the experiment goes. Bland’s experiment library has 44 standard tests, but six cover most early-stage cases: Customer Interview, Smoke Test, Concierge, Survey, Prototype, Landing Page. Each experiment brief specifies the exact success and failure thresholds before it runs, so the result is unambiguous.
Key insight
The terminology shift matters. In stages 1-3 the document is a “hypothesis.” In Stage 4 it becomes “assumptions.” Bland is strict about this distinction because a hypothesis is a large construct (the whole dimension), while an assumption is a single testable statement that starts with “I believe.” Mixing the two terms causes real confusion in practice.
Field interviews and synthesis
Stage 5 builds an interview guide aimed at the 2-3 riskiest dimensions from synthesis. The guide follows strict rules: no leading questions, no hypothetical futures, no opinions about what people might do — only past behavior in concrete situations. Every question maps to a dimension and an assumption in a coverage matrix, so by the end of the interview cycle every assumption from the Critical quadrant has at least one question pointed at it.
The recommended minimum is 15 interviews; the sweet spot is 20-30. Saturation — the point where new patterns stop appearing — usually hits between 12 and 20.
Stage 7 reads the interview notes one at a time, in isolation. Reading them in a batch averages the patterns and erases the specifics; isolation preserves both. For each dimension the synthesis records: pattern (1-2 sentences), supporting evidence count (N out of M respondents), key verbatim quotes, and confidence change (v2 score → v3 score). Surprises — findings that contradict the original hypothesis — get a separate section, because they are often the most valuable thing in the whole cycle.
If confidence dropped on a dimension between v2 and v3, the synthesis flags a possible loop and recommends one of three actions: more validation, return to research, or a pivot.
Post-launch metrics: three instruments together
Stage 9 sets up post-launch measurement through three instruments used together. None of them is sufficient on its own.
Sean Ellis 40% Survey. One question: “How would you feel if you could no longer use [product]?” with four answer options (Very disappointed / Somewhat disappointed / Not disappointed / N/A). The threshold of ≥40% “Very disappointed” was empirically derived by Ellis from a sample of about 100 startups he consulted to: companies that crossed 40% could scale via paid marketing with positive economics. The minimum sample is 40 active users; less than that is statistical noise. Distribution is restricted to active users — not the newsletter list, not cherry-picked top customers.
Retention cohorts. A cohort table showing what percentage of each signup cohort returns in week 1, 2, 3, 4, 8, 12. Healthy retention curves flatten at a stable level (>40% for consumer, >60% for B2B, >25% for high-frequency products) instead of falling toward zero. Looking at an overall retention average instead of cohorts hides the decay over time and is one of the most common measurement mistakes.
First Round Levels of PMF. A four-level ladder — Nascent / Developing / Strong / Extreme — assessed across three dimensions (Satisfaction, Demand, Efficiency). The overall level is the minimum of the three. A team cannot be Strong on Satisfaction and Nascent on Efficiency at the same time; that mismatch is the bottleneck and tells you what to work on next.
Common pitfalls
After running this cycle on multiple products, the same mistakes show up:
-
Solution-framed problem statements. The problem is described as “they don’t have our tool” instead of the underlying obstacle. If the problem cannot be stated without naming the product, it has not been validated.
-
Audience defined by demographics. “Women 25-45” is a demographic category. A useful audience definition uses 2-3 behavioral or situational attributes and explicitly separates the “Now” segment from “Future” expansions.
-
Features dressed up as benefits. “Real-time sync with the marketplace API” is a feature. The benefit is “orders show up automatically — no more manual copying.” A value proposition built from features instead of benefits cannot be tested for resonance.
-
Confidence inflated to feel better. When market research contradicts the v1 hypothesis, the impulse is to reframe rather than lower the score. The cycle is designed to let the score drop.
-
Sean Ellis on the wrong audience. Running the survey on a newsletter list, on friendly early users, or on the top 10% by revenue produces a fake high score that does not predict scale. Active users, random sample, in-context distribution.
-
Skipping straight to metrics. Without the earlier stages, metrics measure noise. The temptation is real because metrics feel like progress.
-
Treating PMF as a single binary state. PMF is a ladder with four levels, and most “we have PMF” claims belong on Level 2 — they have not yet reached the Level 3 threshold for scalable growth. The level you are on changes what you should do next.
When this approach is and is not the right fit
The cycle works well for B2C SaaS, B2B SaaS, marketplaces, DTC products, internal tools, and AI products at the application layer. It assumes you know who the target user is and what problem you are trying to solve — if neither is clear, Stage 0 is premature and you need to start with broader discovery.
It does not fit deep tech or R&D products where Feasibility is the dominant risk for years (quantum computing, biotech, novel materials). For those, Technology Readiness Levels are the right framework; DVF distorts. It also assumes a single product per project — a portfolio of five products means five separate cycles running in parallel.
The cycle does not run experiments for you. Stages 6 and 8 — field interviews and MVP launch — are explicit waiting states. The methodology produces the guide and the dashboard; you do the field work and the launch.
Methodology sources
If you want to read the source material, this is where the cycle’s pieces come from:
- Marty Cagan — Inspired and Empowered. The 7-dimensions decomposition draws on Cagan’s product discovery framing, particularly his insistence that the problem must be stated independently of the solution.
- David Bland and Alex Osterwalder — Testing Business Ideas (2019, Strategyzer). The DVF framework, the 9 assumptions structure, the 2×2 importance/evidence matrix, and the standard experiment library.
- Hamilton Helmer — 7 Powers: The Foundations of Business Strategy. The seven possible long-term moats — Scale Economies, Network Economies, Counter-Positioning, Switching Costs, Branding, Cornered Resource, Process Power.
- Sean Ellis — the 40% PMF survey. Published in 2009; the 40% threshold is an empirical finding derived from companies Ellis advised, with no theoretical basis behind it.
- First Round Capital — The Levels of Product/Market Fit. Todd Jackson, Brian Rothenberg, Carolyn Stein. The four-level ladder and the three-dimension assessment grid.
- Bill Gross — TED talk on the #1 factor in startup success. The analysis that puts timing ahead of team, idea, business model and funding.
- Rahul Vohra — How Superhuman Built an Engine to Find Product/Market Fit. The loop of segmenting the “Very disappointed” group, identifying what they value, rebuilding around them, and re-running the survey.
A tool to run the cycle
The cycle described above is implemented as a Claude Code skill called pmf — an open-source tool that walks one product through all 10 stages, keeps the artifacts in a project folder, and resumes between sessions. It is bilingual (English + Russian, picked at first run), free under MIT, and currently in beta.
The skill, with its 23 reference files (stage logic + methodology + templates) in both languages, lives in the share repository: github.com/alenazaharovaux/share/tree/main/skills/pmf.
Beta means the pipeline holds together end to end on real products, but the rough edges have not yet been mapped. Feedback from outside users — what worked, what broke, what felt off — is the most valuable input the project can get right now. If you try it, please open an issue in the share repo or get in touch.
Resources
- Skill on GitHub — install instructions, full README, all references in EN and RU
- Hamilton Helmer — 7 Powers (book)
- David Bland & Alex Osterwalder — Testing Business Ideas (book)
- Sean Ellis — Finding Product-Market Fit
- First Round Review — The Levels of Product/Market Fit
- Rahul Vohra — Superhuman PMF Engine
- Bill Gross — The single biggest reason why startups succeed (TED)