How to run a literature review for UX research: a practical guide with AI prompts

What is a literature review?

A literature review is a secondary research method in which the researcher collects, screens, and synthesizes published research on a defined question — academic papers, industry reports, peer-reviewed studies, established design pattern libraries, and internal research repositories — to surface what is already known before any new study is run. Unlike a broad desk research scan that pulls in any relevant material, a literature review is a focused, time-boxed audit of the published evidence that produces a structured synthesis with explicit gaps and recommendations. For UX teams a scoping or rapid literature review on a focused question takes anywhere from five hours to two weeks and pays back the cost many times over by stopping the team from rebuilding what other researchers have already documented and validated.

What question does it answer?

What is already known about this user group, this domain, or this design pattern, and where is the published evidence strong versus weak?
Have other researchers already answered the question we are about to spend a sprint trying to answer ourselves?
Which design patterns have been tested and validated in published research, and which are still untested folklore?
What does the academic and industry literature say about the failure modes of the approach we are about to ship?
Which research gaps justify running our own primary study, and which questions can we answer right now from the existing evidence?
For a regulated or unfamiliar domain (healthcare, finance, accessibility), what does the published research say we are required to consider before designing?

When to use a literature review

Before any major research project, to surface the questions that have already been answered and free the primary research budget to investigate the genuine unknowns.
When entering a new domain, product category, or user segment where the team has no prior experience and needs to ramp up on the established findings before designing or interviewing.
When justifying a design decision to stakeholders who want evidence stronger than “best practices” — peer-reviewed studies and industry reports carry more weight than opinion.
When auditing a controversial design choice (an interaction pattern, a copy decision, a metric) against the published evidence to confirm whether the team’s intuition is supported.
When preparing a research proposal, grant application, or strategy document that needs to ground its recommendations in the existing body of knowledge.
When the budget rules out primary research entirely and the team still needs a defensible, evidence-based answer to a strategic question.

Not the right method when the question is specific to your unique users and product and no analog exists in the published literature — in that case primary research (interviews, usability testing, analytics) is the only honest answer. It is also the wrong call when the team needs fresh, situational insight about a live user — the literature is a snapshot of what was true when the studies were published, not a real-time read on the current customer base. A literature review should not be used to delay decisions indefinitely; if the question is urgent and the literature is thin, run a small primary study and document what you learned for the next team. Finally, do not confuse it with reading a few blog posts; a real literature review screens for source quality, includes contradictory evidence, and synthesizes across studies rather than summarizing each one in isolation.

What you get (deliverables)

Research questions document: a one-page statement of the two to four focused questions the review will answer, the scope (time window, source types, languages), and the inclusion/exclusion criteria for sources.
Source log: a structured spreadsheet or database with one row per source, capturing author, year, source type, study design, key findings, relevance score, and a one-paragraph extraction of the relevant insights.
Thematic synthesis: the findings organized by theme rather than by source, with convergent evidence, contradictions, and gaps explicitly called out.
Annotated bibliography: a short evaluative paragraph for each high-value source so future readers can decide whether to dive into the original paper themselves.
Gap analysis: an explicit list of questions the literature does not answer and recommendations for primary research that would close the gaps.
Recommendations document: concrete design or research implications tied back to specific sources, so each recommendation has a citation and a rationale.
Readout brief: a five to ten page document or short deck with the question, the method, the headline themes, the gaps, and the recommendations; doubles as a baseline that future literature reviews on the same topic can build on.

Participants and team

Participants: none recruited. A literature review is a desk method on published evidence, conducted by one to three researchers depending on the type and scope.
Researchers: for a UX scoping or rapid review, one researcher is enough; for a systematic review aiming for full coverage, two researchers are the minimum because dual screening and dual extraction reduce single-reviewer bias.
Question definition: 1–3 hours to write the focused research questions, the scope, and the inclusion criteria. This is the step beginners under-invest in and pay for later.
Search strategy: 1–3 hours to design the search terms, pick the databases, and run the initial pull. For systematic reviews this expands to a day or more with an information specialist.
Screening: 0.5–1 day for a scoping review of 50–150 candidate sources, longer for systematic reviews that aim for hundreds.
Extraction and synthesis: 1–3 days for a UX scoping review (15–25 high-quality sources), 2–8 weeks for a scoping review of a new domain, several months for a full systematic review.
Writing the brief: 0.5–1 day for the UX scoping format, longer for academic or systematic outputs.
Total wall-clock time: 5 hours for a focused micro-review, 1–2 weeks for a UX scoping review on a defined question, 4–6 weeks for a broader scoping review of a new domain, 6 months to 2 years for a full systematic review with two researchers.

How to run a literature review (step-by-step)

1. Define the focused research question

Write two to four specific questions before opening any database. Vague questions like “what is good UX” produce vague reviews and wasted time; focused questions like “what does the published research say about the conversion impact of guest checkout for first-time mobile users” produce actionable findings within a few days. Test each question by asking yourself whether you can imagine a concrete answer; if you cannot, the question is still too broad. Tie every question to a real design or research decision so the synthesis stays actionable. If the project covers more than one decision, run a separate review for each rather than trying to bundle them.

2. Pick the type of review and the scope

Decide upfront which type of literature review fits the project: a narrative review for a focused question with a single researcher and a one to four week budget, a scoping review for mapping the available evidence in a new area with two to eight weeks of work, a rapid review for a more rigorous synthesis under a tight deadline (two to six months), or a full systematic review for academic publication or evidence-grade decisions (eight months to two years). For most UX work the right choice is a scoping or narrative review with an explicit time box. State the scope in writing: minimum and maximum number of sources, the time window (sources from the last five to ten years for UX, longer for foundational studies), the languages, and the source types you will include.

3. Design the search strategy

List the keywords and synonyms for each research question, then pick the databases and source types to search. Internal sources first: any prior research repositories, past usability tests, support tickets, and design documentation that the team already owns. External sources next: Google Scholar and ACM Digital Library for academic work, Nielsen Norman Group, Baymard Institute, dscout People Nerds and similar industry blogs for practitioner insights, Litmaps and Connected Papers for citation mapping, and competitor case studies where they exist. For systematic reviews work with an information specialist to design the search; for a UX scoping review, draft the keywords yourself and trial them against a benchmark list of three to five sources you already know are relevant, then refine.

4. Screen sources for relevance and quality

Pull a candidate set (30–150 sources for a scoping review, more for systematic) and screen them against the inclusion criteria using only the title, abstract, and first paragraph — do not read every paper in full at this stage. Apply two filters: relevance (does this source address one of the research questions for this user context) and credibility (peer-reviewed and high-credibility industry sources first, blog posts second, marketing fluff never). Score each source High / Medium / Low for relevance and aim for 15–25 High sources rather than 100 superficial ones. Always record the sources you discarded and why, so future reviewers can audit the decision.

5. Extract findings into a structured log

For each High and Medium source, read the full text and capture five fields in a shared spreadsheet: the research questions the source addresses, the methods used, the key findings relevant to your project, the limitations or context that might affect applicability, and the implications for your specific design or research question. Use direct quotes or precise paraphrasing rather than free-form summary so the extraction is auditable. Tag each row with the theme(s) it relates to so the synthesis step has a clean structure to work from. For paid or paywalled sources you cannot access, note the citation and try to find a public summary or preprint version before giving up.

6. Synthesize findings across sources, not source by source

The biggest single error in literature reviews is summarizing each source in turn instead of synthesizing across them. Reorganize the extraction by theme, then for each theme write a paragraph that combines the evidence from multiple sources, names the convergent findings, flags the contradictions, and assesses the strength of the evidence. Look for patterns (“three studies of mobile checkout flows found that guest checkout increased conversion, with one study noting a downstream retention cost”), not for a list of disconnected paragraphs. Stop adding sources when new ones stop yielding new themes — that is the moment of saturation, and it usually arrives faster than beginners expect.

7. Identify gaps and contradictions explicitly

A literature review is most useful when it tells the reader what is not known as clearly as it tells them what is. Make a separate section for the gaps: which of the research questions the literature does not answer, which contexts have not been studied (your user segment, your industry, your device, your language), and which findings contradict each other in ways the existing literature has not resolved. Each gap is a candidate for primary research; each contradiction is a place where the team needs to make an explicit decision about which evidence to trust and why. This is the section that converts the review from a summary into a research roadmap.

8. Connect findings to design or research decisions

For every theme in the synthesis, write one or two concrete recommendations tied directly to the project’s design or research questions. A finding without a recommendation is a dead summary; a recommendation without a finding is opinion. Be explicit about how strong the evidence is — “five peer-reviewed studies converge on this” is a different recommendation than “one industry blog claimed this.” Where the literature contradicts the team’s instinct, surface the contradiction in the recommendation rather than burying it. The recommendations should be the section the design and product leads actually read.

9. Write the brief and present to stakeholders

Produce a five to ten page brief or a short deck. Open with the research questions, the method (search strategy, source types, inclusion criteria), and the headline findings on the first page. Walk through each major theme with the convergent evidence, the contradictions, and the implication. Close with the gap analysis and the prioritized recommendations. Present in person to the design, product, and research leads so they can ask about source quality and applicability — the conversation is where the synthesis becomes a decision. Keep the source log attached as an appendix for the team that wants to dive deeper into specific studies.

How AI changes literature review

AI compatibility: partial — Literature review is one of the most AI-amenable research methods because the bulk of the work is mechanical: query a database, read abstracts, extract structured fields, and cluster findings into themes. Modern AI tools built specifically for the workflow (Elicit, Consensus, SciSpace, Scite, Litmaps) can run the search, summarize the abstracts, surface convergent findings across hundreds of papers, and produce a draft synthesis in hours rather than weeks. The catch is that AI is consistently overconfident on source quality, struggles with the contradictions and contextual caveats that matter most, and will happily fabricate citations if pushed. The researcher’s job shifts from doing the extraction to verifying it, evaluating which evidence is actually strong, and translating the findings into recommendations the team can act on.

What AI can do

Search the literature semantically: Elicit, Consensus, and SciSpace go beyond keyword search and find papers that semantically match the research question, surfacing relevant sources that traditional database searches miss. This compresses the search step from a day of database querying to an hour of refining the question.
Summarize and extract structured findings from papers: Given a PDF or a citation, modern AI tools produce a one-paragraph summary, extract the methods, the sample size, the key findings, and the limitations into a structured row, and link back to the relevant sentences. What used to take 20–30 minutes per paper now takes 2–3 minutes.
Synthesize across multiple sources at scale: Tools like Elicit and Consensus can run a “what does the literature say about X” query across 50–500 papers and produce a draft synthesis with the convergent findings, contradictions, and the count of supporting versus opposing studies. This is the step that historically scaled poorly with sample size.
Map citation networks and find adjacent sources: Litmaps, Connected Papers, and Research Rabbit visualize how a starting set of papers cite each other, surfacing seminal work the researcher missed and related papers the search query did not catch. This replaces hours of manual citation chasing.
Draft the synthesis brief: Given an extracted source log, an LLM like Claude or GPT-4o can produce a first-draft brief organized by theme, with citations to the specific sources behind each finding. The researcher then rewrites it for tone, sharpens the gap analysis, and removes the inevitable false certainty.
Flag papers that contradict each other: A model can read the extracted findings across the source log and surface the cases where two or more sources disagree, which is the start of the gap analysis. Doing this manually is slow and easy to miss.

What requires a human researcher

Defining the research question that matters: AI will happily search any question you give it, but choosing the question that actually maps to the team’s decision is product judgment that depends on knowing the project, the stakeholders, and what the team has already tried. Get this wrong and the review is technically thorough but irrelevant.
Verifying source quality and study design: AI tools systematically over-trust their inputs and will rank a marketing blog and a peer-reviewed RCT as equally relevant if the keywords match. The researcher has to read the methods section of the high-impact sources and downweight the ones that do not pass basic critical appraisal.
Catching hallucinated citations: Generic LLMs (not the dedicated literature tools) will fabricate plausible-looking references that do not exist. Any citation that came out of a chat-style interaction needs to be verified against the actual database before it goes into a brief.
Calibrating findings to the local context: A finding that holds in the published study may not hold for your user segment, your industry, or your device. AI cannot make that judgment; the researcher has to ask whether the published context matches the team’s context and downweight the sources where it does not.
The decision about what the team should do: The recommendations are the part of the brief that drives the next sprint, and they require a human who knows what the team can ship, what the engineering constraints are, and how much risk the business will accept. AI can draft them; only a human can commit to them.

AI-enhanced workflow

Before AI, a UX scoping literature review on a focused question took one to two weeks of researcher time: a day to design the search, a day to screen 100+ candidates, three to five days to read and extract from the 15–25 selected sources, two days to synthesize and write the brief. The bottleneck was the extraction step — slow, repetitive, and easy to drift from the original research question as fatigue set in.

With AI in the workflow, the same project compresses to one to three days. The researcher spends two hours framing the research question and the scope, then runs the question through Elicit or Consensus to get a first-draft list of relevant sources with extracted findings. The researcher reads the abstracts the model surfaced, accepts the obvious matches, rejects the irrelevant ones, and adds anything the model missed via Litmaps citation chasing. The extraction step then runs against the cleaned source list at machine speed, with the researcher spot-checking ten to twenty percent of the rows manually and reading the methods sections of the highest-impact sources in full. The synthesis pass uses the model’s draft as a starting point, and the researcher rewrites the themes for nuance, the contradictions for honesty, and the recommendations for actionability.

The catch is the same as for AI-assisted desk research and heuristic evaluation: the speed-up is real only when a human reads the highest-impact sources in full and verifies the citations. Studies of LLM-generated literature reviews find that models cluster the easy themes reliably but miss the contradictions and overweight whichever framing appears in the most papers, even when the minority view is methodologically stronger. The researchers who get the most value from AI here treat the dedicated tools (Elicit, Consensus, SciSpace) as a competent junior research assistant, the generic LLMs as a draft writer that lies about citations, and themselves as the critical reader who knows which evidence the team can stake a decision on.

Example from practice

A B2B fintech company was about to redesign the multi-factor authentication flow for a high-stakes payment product and the security team wanted “industry best practices” before committing to an approach. The product manager had two weeks before the design sprint started and a strong instinct to use SMS one-time passwords because that was what the support team was asking for. The lead researcher was worried that SMS was the wrong call and proposed a focused literature review to settle the question with evidence rather than opinion.

She defined three research questions (“what does the published research say about user friction in MFA flows for B2B payments,” “what is the documented security/usability tradeoff between SMS, authenticator apps, hardware keys, and passkeys for non-technical users,” “which MFA patterns have the lowest abandonment rates in financial services”), set a one-week time box, and used Elicit and Consensus to pull a candidate set of 87 papers and industry reports across academic HCI venues, NIST guidance, the FIDO Alliance reports, Baymard, and three peer-reviewed studies on MFA usability. She screened the candidates down to 22 high-relevance sources, extracted them into a Notion database with the structured fields, and ran a synthesis pass with Claude on the extracted log. She read the methods sections of the eight highest-impact peer-reviewed sources in full and verified the citations the model surfaced.

The synthesis produced a clear answer: across nine peer-reviewed studies and four industry reports, SMS one-time passwords had the highest abandonment rate (12–18%) and the lowest security score, while authenticator apps had a middle position and passkeys had the lowest abandonment for users who already had a compatible device. Hardware keys had the strongest security but the highest setup friction. The recommendation was to default to passkeys with authenticator-app fallback, skip SMS entirely except as a recovery mechanism, and accept that the support team’s request was a request for the wrong fix. The recommendation went into the design sprint as the starting point, the redesign shipped six weeks later with passkeys as the default, the abandonment rate on the new flow dropped from 14% to 4% over the first month, and the support team retracted their original ask once they saw the evidence. The literature review took the researcher about 26 hours of work including the synthesis brief — versus the multi-week primary study that would have been the only alternative.

AI prompts for this method

4 ready-to-use AI prompts with placeholders — copy-paste and fill in with your context. See all prompts for literature review →.