How to conduct a contextual inquiry: a practical guide with AI prompts
Contextual inquiry is a field research method where the researcher observes and interviews users in their actual work or home environment while they perform real tasks. The researcher acts as an apprentice learning from the user as the expert, combining direct observation with on-the-spot questions to uncover how and why people do what they do in their natural context.
This method works best when you need to understand the “why” behind user behavior — particularly for complex workflows that users cannot easily describe in an interview, or when the physical and social environment plays a role in how people use a product.
What you learn
- How users actually perform tasks in their real environment, including workarounds and shortcuts they have developed
- What environmental factors (tools, colleagues, interruptions, physical space) influence behavior
- Where workflows break down in ways that users themselves may not notice or report
- The gap between what users say they do and what they actually do
What you get
- Detailed field notes with timestamped observations and direct quotes
- Work models: flow models (how work moves between people), sequence models (step-by-step task flows), artifact models (documents and tools users rely on), physical models (layout of the workspace)
- Affinity diagram grouping observations into themes across participants
- Design implications tied directly to observed behavior
- Photos or videos of the workspace, artifacts, and workarounds
When to use (and when not to)
Use contextual inquiry when:
- You need to understand complex workflows with many steps, exceptions, and informal practices that users cannot articulate in an interview
- The physical or social environment influences how users interact with a product
- You suspect a gap between reported behavior and actual behavior
- You are designing for expert users whose routines are so automatic they cannot describe them without being observed mid-task
- You are at the beginning of a project and need to understand the full context before generating solutions
Skip contextual inquiry when your research question is about attitudes or preferences rather than behavior, when you cannot physically visit the user’s environment, when you need quantitative data, or when the task takes only a few seconds — the overhead of a field session would not justify the insight.
Participants and timing
Participants: 4–8 users. Because sessions are long and data-rich, fewer participants yield useful results compared to standard interviews. For a single user segment, 4–6 sessions typically reach saturation.
Session length: 1.5–3 hours. The first 15–20 minutes are introduction and setup, the core observation-plus-interview portion takes 1–2 hours, and the final 15–20 minutes are collaborative interpretation and wrap-up.
Total timeline: 2–4 weeks:
- Preparation (recruiting, logistics, protocol): 3–5 days
- Fieldwork: 3–7 days (1–2 sessions per day maximum — sessions are mentally exhausting)
- Analysis and synthesis: 3–5 days
How to conduct a contextual inquiry
1. Define your focus areas
Write down 3–5 specific aspects of the user’s work you want to understand. Without focus areas, you risk drowning in data. Example: “How do nurses hand off patient information between shifts?” rather than “How do nurses work?“
2. Recruit and schedule participants
Recruit users who perform the task as part of their regular routine. Schedule sessions during a time when the task naturally occurs. Obtain permission from the user’s manager or household members if needed, and confirm that you may observe, take notes, and photograph the workspace.
3. Prepare your observation protocol
Create a lightweight guide with your focus areas, a few opening questions, and a list of things to watch for (tools, artifacts, interruptions, collaboration). Do not script every question — contextual inquiry depends on responding to what you observe in the moment.
4. Open the session: establish the partnership
Arrive at the user’s location. Explain that you are there to learn from them, not to evaluate them. Establish the apprentice-master dynamic: “I want to understand how you do your work. Please do things the way you normally would, and I will watch and ask questions as we go.” Start with a brief conventional interview (5–10 minutes) to understand their role and typical day.
5. Transition to observation
Ask the user to begin a task they would normally do at this time. Move from interviewing to observing. Position yourself so you can see the screen, workspace, and the user’s hands without blocking them. Stay quiet during critical moments.
6. Probe during natural pauses
When the user pauses, switches tasks, or finishes a step, ask questions about what you just observed. Use retrospective probing: “I noticed you switched to the spreadsheet just now — what triggered that?” or “You seemed to hesitate before clicking that button — what were you thinking?” Follow unexpected behaviors and workarounds — these are where the richest insights hide.
7. Conduct collaborative interpretation
In the final 15–20 minutes, share your observations and initial interpretations with the user. This is what distinguishes contextual inquiry from passive observation: you check your understanding directly. “It seemed like you keep a paper list alongside the digital system — is that because the system doesn’t show everything you need?” The user corrects or confirms, adding depth to the data.
8. Debrief immediately after the session
Within 30 minutes, write up expanded field notes while memory is fresh. Note the top 3 surprises, any observations that contradicted your assumptions, and questions to explore in the next session. Capture photos of key artifacts and the workspace layout.
9. Analyze across sessions
After completing all sessions, build an affinity diagram: write each observation on a separate note, group related notes into clusters, and name the clusters as themes. Create work models (flow, sequence, artifact, physical) to visualize patterns. Write insights as design implications: “Because users maintain a shadow paper system, the digital tool must surface the same information without requiring navigation to a separate screen.”
How AI changes this method
AI compatibility: partial — AI strengthens preparation (observation protocol, question design) and analysis (session note synthesis, work model creation) but cannot replace the researcher’s physical presence alongside the user. Contextual inquiry depends on observing someone doing real work in their real environment, asking clarifying questions in real time, and interpreting the interplay between user, task, and context.
What AI can do
- Observation protocol generation: Given a research question and task domain, an LLM can draft a structured observation checklist: what to watch for, what questions to ask at each stage, and what artifacts to photograph.
- Real-time transcription of contextual interviews: Audio recording tools with AI transcription (Otter.ai, Dovetail) capture the conversation, freeing the researcher to focus on observing rather than note-taking.
- Session note expansion: Brief field notes taken during the session can be expanded into detailed write-ups using an LLM. The researcher provides bullet-point observations and the AI generates a structured narrative.
- Work model drafting: After multiple sessions, an LLM can draft sequence models, artifact models, and flow models from coded session notes — creating a first pass that the researcher refines.
- Cross-session pattern detection: When session notes from 5-8 contextual inquiries are fed to an LLM, it can surface recurring workflow patterns, common breakdowns, and contradictions between participants.
- Artifact analysis: Photos of workspaces, tools, and workarounds can be described by multimodal AI models, creating a searchable catalog of physical artifacts across sessions.
What requires a human researcher
- Being present during the task: The master-apprentice dynamic — sitting beside the user, watching them work, asking “Why did you do that?” at the exact right moment — cannot be delegated to AI.
- Reading the gap between words and actions: Users often say “this works fine” while visibly struggling. The researcher notices the hesitation, the workaround, the sigh — and probes further.
- Adapting questions in real time: A contextual inquiry session is not scripted. The researcher must decide on the spot when to interrupt, when to stay silent, and what follow-up will reveal the deeper issue.
- Interpreting environmental context: Why the user’s desk is arranged a certain way, why they keep a paper checklist next to a digital system, why they glance at a colleague before proceeding — these contextual signals require human judgment.
AI-enhanced workflow
The biggest impact of AI on contextual inquiry is in the gap between fieldwork and deliverables. Traditionally, a researcher returns from a day of sessions with pages of handwritten notes, photos, and audio recordings, then spends 1-2 days per session turning this raw material into structured work models and insights. AI compression tools cut this post-session processing roughly in half: transcription is instant, note expansion takes minutes instead of hours, and a first-pass pattern analysis across sessions can be generated in a single prompt.
During the session itself, AI’s role is limited to passive recording. The researcher still needs to manage the master-apprentice relationship, decide what to observe closely, and ask the right questions.
Where teams see the most efficiency gain is when running contextual inquiry at scale (6-10 sessions). Without AI, synthesis across that many sessions takes a week. With AI-assisted coding and pattern extraction, the same synthesis can be done in 2-3 days, while maintaining the depth that comes from human interpretation of each individual session.
Tools
- Field recording: Smartphone camera (photos of workspace, artifacts, screens), portable audio recorder or Otter.ai (ambient recording with consent), GoPro or body camera (for hands-free video)
- Note-taking: Physical notebook (less intrusive than a laptop), Notion or Google Docs (post-session write-up), sticky notes for affinity diagramming
- Analysis and synthesis: Miro (digital affinity diagrams and work models), Dovetail (tagging and coding field notes), ATLAS.ti or Dedoose (qualitative data analysis for larger studies)
- Scheduling and logistics: Calendly (scheduling), consent form templates
- AI-assisted analysis: Otter.ai (transcription of session audio), Notably (research repository with AI search), Claude or ChatGPT (synthesis of field notes into themes and work models)
Common mistakes
Treating it as a long interview
New researchers often sit across from the user and ask questions the entire time, never actually observing the user work. Contextual inquiry requires watching real tasks unfold — the interview component happens around the observation, not instead of it. If the user is talking more than working, you are doing an interview, not a contextual inquiry.
Failing to establish the apprentice-master dynamic
Without the framing of “I am here to learn from you,” users treat the session like a performance review. They do things “the right way” instead of “the way they actually do it.” Spend time at the beginning making clear that you want to see their real process, including shortcuts and workarounds.
Over-interpreting without checking
It is tempting to see a user hesitate and assume you know why. Contextual inquiry includes a built-in mechanism to prevent this: collaborative interpretation. You share your interpretation (“It looked like you were unsure which field to use”) and let the user correct you (“No, I was waiting for the system to load — it always lags here”). Skipping this step means building design decisions on assumptions.
Not capturing artifacts
Users rely on physical and digital artifacts — sticky notes, printed checklists, browser bookmarks, email chains — that reveal how they have adapted around the system’s limitations. Forgetting to photograph or document these artifacts means losing some of the most concrete evidence for design improvements.
Visiting only one type of environment
If your users work in different settings, visiting only one produces a skewed picture. The environment shapes behavior, and what works in one context may fail in another. Plan visits across at least two distinct environments when possible.
Works well with
- In-depth interviews: Run interviews first to identify themes worth observing, then use contextual inquiry to see those themes in action.
- Journey mapping: Contextual inquiry provides ground-truth data for journey maps — you see the actual touchpoints and friction rather than relying on recall.
- Persona building: Behavioral patterns observed during contextual inquiry form the most evidence-based foundation for personas.
- Heuristic evaluation: Start with a heuristic evaluation to identify suspected issues, then use contextual inquiry to observe whether those issues actually affect users in their real environment.
- Diary studies: Diary studies capture behavior over time that a single contextual session cannot. Use a diary study to identify which moments to observe, then schedule contextual sessions during those key moments.
Example
A healthcare software company was redesigning its electronic health records system. Usage analytics showed that nurses spent 40% more time on documentation than expected, but surveys and interviews produced only vague complaints about “too many clicks.”
Six contextual inquiry sessions across two hospitals revealed that nurses maintained handwritten “brain sheets” — personalized paper summaries of patient information — because the system required navigating through four different screens to see what the nurse needed at a glance. Nurses also developed verbal shorthand codes during handoffs that the digital system did not support.
The team designed a customizable patient summary dashboard that consolidated the four screens into one view, with drag-and-drop widgets nurses could arrange to match their mental model. The redesign reduced documentation time by 35% in a pilot study and eliminated the need for paper brain sheets in 80% of cases.