GPT 5.5 Just Did What No Other Model Could — Here's the Catch

Claire Vo ran GPT 5.5 on a near-six-hour autonomous loop with 98% edge-case resolution. The catch: $180 per million output tokens.

Madison

2 min read·May 2, 2026·Summarizing Lenny's Newsletter

Lenny just published a write-up on GPT 5.5 with Claire Vo, and the headline is real:

GPT 5.5 ran a near-six-hour autonomous loop and resolved 98% of edge cases on a migration across millions of chat threads.

That is not a number you see from a model very often. But before you start canceling your other subscriptions, the rest of the article is more interesting — because it's about when GPT 5.5 is worth its price tag, and when it's absolutely not.

What GPT 5.5 actually crushes

Claire Vo's testing shows the model is built for long-running, autonomous, technical problems. Specifically:

Migration cleanup — the 98% edge-case resolution number above
Tech debt and flaky tests — areas where most models give up halfway
Hardware reverse-engineering — Vo describes GPT 5.5 successfully reverse-engineering a proprietary Bluetooth speaker using packet-sniffer data, after Claude Code and GPT-4 failed at the same task

That last one is wild. Reverse-engineering a hardware protocol from packet captures is the kind of task that usually requires a senior engineer and a few weeks. The model did it autonomously.

The catch — and it's a big one

The cost: $180 per million output tokens.

That's significantly more expensive than basically any other model on the market. And Vo is candid that for most consumer use cases, she couldn't justify the spend. It's not an everyday tool — it's a specialist.

Where GPT 5.5 wins is on the kind of high-stakes engineering problem where one good solve is worth thousands of dollars of human time. Where it loses is everything else — because for "everything else," cheaper models are now plenty capable.

The /personality command

One feature that caught my eye: GPT 5.5 has a /personality command that lets you shape the tone and prompt patterns for the model.

I've been doing the same thing manually for the last 18 months — training my own custom GPT bots on specific frameworks so the responses come out in a voice I can actually use in marketing copy. Having that as a built-in primitive instead of a hack is a real upgrade.

How it stacks up vs Claude Code

Vo notes that GPT 5.5 outperformed Claude Code on intelligence tests and was more efficient inside the Codex environment. That's a meaningful claim because Claude Code has been the developer favorite for the last year.

But — and this is important — outperforming on intelligence doesn't mean outperforming on cost-effectiveness. If you're building a daily coding workflow, you want something fast and cheap. If you're solving the hardest problem on your roadmap once a month, you want GPT 5.5.

What I'd add

This is the same pattern we saw when AWS Spot instances launched — there are tools you use all the time because they're cheap and there are tools you reach for when the problem is big enough that you need the firepower.

GPT 5.5 is the second kind. If you treat it like the first, your bill will eat you alive. If you save it for the hard, autonomous, "I want to come back in 6 hours and have it solved" problems — it pays for itself instantly.

The Bottom Line

GPT 5.5 is the model you call in for the impossible job. It's not your daily driver. The real skill in 2026 is going to be knowing which problems deserve the $180 model and which don't — because everyone with a budget is about to learn that lesson the hard way.

aiGPT 5.5 reviewOpenAI new modelClaire Vo GPTAI coding model comparisonClaude Code vs GPTautonomous AI agentAI tech debt cleanupGPT pricing 2026