AI prompts for content analysis: codebook drafting, coding, summaries

These four prompts cover the parts of a content analysis project where AI saves the most time without giving up rigor: drafting a starting codebook from a sample, applying the finished codebook at scale, summarizing categories with illustrative quotes, and auditing the AI-coded output for systematic errors. Each prompt is meant to be filled in with your own data and context, then run in Claude, ChatGPT, Gemini, or any LLM with a long enough context window.

Prompt 1: Draft a starting codebook from a sample

I am running a content analysis project on [type of data — e.g., App Store reviews, support tickets, NPS comments] for [product or context]. The research question is:

[the specific question the analysis should answer]

Here are [50-100] sample units from the dataset, in the order they were collected:

[Paste sample units, one per line, numbered]

Please:
1. Read all units and propose 8-15 candidate categories that capture the main topics, complaints, requests, or sentiments expressed
2. For each category, write a one-sentence definition
3. For each category, write a one-line inclusion rule and a one-line exclusion rule
4. For each category, pick 2-3 example units from the sample that fit the category cleanly
5. Flag any category that overlaps with another and suggest how to keep them mutually exclusive
6. Note any units in the sample that did not fit any category — these may signal a missing code or an out-of-scope item
7. Suggest the unit of analysis you used (per review, per sentence, per paragraph) and explain your choice

Prompt 2: Apply a finished codebook to a batch of units

I have a finished codebook for a content analysis project. I will paste the codebook first, then a batch of units to code.

Codebook:
[Paste each category name, definition, inclusion rule, exclusion rule, and 2-3 examples]

Coding rules:
- Apply one primary code per unit unless the unit clearly addresses two distinct categories
- If a unit does not fit any category, tag it OUT_OF_SCOPE and note why
- If a unit is ambiguous, tag it AMBIGUOUS and explain which two categories competed
- Quote the exact phrase that triggered each code

Units to code:
[Paste units, numbered, one per line]

Please return a table with columns: unit_id, primary_code, secondary_code (if any), trigger_phrase, confidence (1-5), notes.

Prompt 3: Build category summaries and pull illustrative quotes

I have finished coding [N] units across [M] categories for a content analysis project on [topic]. The research question is:

[specific question]

Here are all units assigned to category [CATEGORY_NAME]:

[Paste all units in the category]

Please:
1. Write a one-paragraph summary (5-7 sentences) describing what this category captures, what the most common subpatterns are inside it, and how it relates to the research question
2. Identify 2-4 distinct subpatterns within the category and name each one
3. Pick 5 illustrative quotes that together capture the range — the most representative, the most extreme, the most surprising, and two that show different subpatterns
4. Note any units in the category that do not seem to fit cleanly — they may need recoding or the category may need to be split
5. Suggest one concrete recommendation the team could act on based on this category alone

Prompt 4: Audit the AI-coded output for systematic errors

I auto-coded [N] units with the following codebook, and I need to audit the output before trusting the counts.

Codebook:
[Paste codebook]

Here are 50 randomly sampled units with their assigned codes:

[Paste units with assigned primary code]

Please:
1. For each unit, judge whether the assigned code is correct, partially correct, or wrong, and explain why
2. Identify any systematic patterns of error — categories the auto-coder seems to confuse, types of statements it misreads, sarcasm or mixed sentiment it missed
3. Suggest specific changes to the codebook definitions or exclusion rules that would fix the systematic errors
4. Estimate the overall accuracy of the auto-coding pass and whether the dataset needs to be re-coded after the fixes
5. Flag any individual units that look like edge cases worth raising with the team during the readout