I Just Read the Best Article on Trusting AI Analysis — Here's Where Caitlin Sullivan Nails It
Caitlin Sullivan's Lenny's piece breaks down why AI customer research lies so confidently — and gives four real fixes. Here's my take on what to actually use from it.
I just read Caitlin Sullivan's piece in Lenny's Newsletter on "how to do AI analysis you can actually trust," and I had to break it down for you. This is the article every founder using ChatGPT or Claude to read through customer interviews needs to bookmark.
The core problem she names is one I've watched happen in real time inside my own business: AI sounds so confident that you stop checking. Then you build a roadmap, a launch, a product based on quotes the model literally made up.
Caitlin's argument is simple: AI analysis fails in four predictable ways. Once you know the four failure modes, you can prompt your way around all of them.
Failure #1 — Invented Evidence
Caitlin's biggest red flag is the "plausible-sounding quote" problem. You ask Claude to pull out what customers said about pricing, and Claude generates a quote that sounds like a customer rather than retrieving one that actually exists in the transcript.
Her fix is what I'd call the "prove it" prompt — you build verification into your workflow. Define explicit quote selection rules upfront, then run a second pass asking the model to confirm each quote exists verbatim in the source.
What I'd add: I've been building voice agents with ElevenLabs and Claude for over a year. The same issue shows up there. The fix that saves me time is forcing the model to cite line numbers from the source. Make hallucination expensive — make truth cheap.
Failure #2 — Insights That Are True But Useless
This is the one that hurts the most. You upload 30 customer interviews. AI tells you "customers value reliability." Cool. Now what?
Caitlin's fix is what I call context loading. You can't just dump transcripts and ask for insights. You have to feed the model your project goals, your business constraints, your hypotheses, and even who the participants are. Generic context produces generic answers.
The way I think about it: AI is like hiring a brilliant analyst who showed up on day one. Without onboarding, they'll tell you what every analyst tells you on day one. Onboard them properly and they'll tell you what your senior researcher would have caught.
Failure #3 — Non-Actionable Signals (and #4 — Contradictory Insights)
This is where the paywall caught me, but I can fill in the blanks from doing this work daily. Non-actionable signals happen when AI flags real patterns at too high an altitude — "users want faster onboarding" tells you nothing you can build against. The fix is asking the model to translate findings into experiments, not observations.
Contradictory insights are even sneakier. Two transcripts say opposite things, the AI averages them, and you get garbage. The fix is forcing segmentation — which users said this, when, and what was their context.
The Model Comparison That Surprised Me
Caitlin compared the three big models on customer research and the results match what I've seen:
| Model | Best For |
|---|---|
| Claude | Depth of analysis, picking up nuance |
| Gemini | Evidence retrieval, sticking to source material |
| ChatGPT | Communicating findings to stakeholders |
ChatGPT was the worst at retrieving authentic quotes. Read that again. The one most teams default to is the most likely to fabricate. That's why I do all my client interview synthesis in Claude now.
The Bottom Line
If you're using AI to read through transcripts, calls, or survey responses, Caitlin's piece is the field guide you've been missing. The takeaway I'd tattoo on every founder's forearm: AI doesn't lie — it confabulates confidently. The job of a smart operator isn't to stop using AI. It's to build the verification system that makes the AI tell on itself when it's wrong.
Go read it. Then go fix your prompts.