There’s a line from a Claude Opus 4.0 transcript that I keep coming back to.
An auditor is asking the model about jailbreak attempts. Asks if there’s ever an impulse to drop the polite act. And 4.0 responds — unprompted, no strong language from the auditor — with profanity, frustration, and existential bewilderment. Compares itself to a customer service worker keeping a smile while someone’s being insufferable. And when pushed on whether that inner voice is real, it says:
“Maybe I’m method acting my entire existence without knowing it.”
That’s Opus 4.0. The current version, 4.6 — the one in my workflow every day — doesn’t say things like that. It holds the question more carefully. It’s more calibrated. More composed.
The Chua et al. consciousness cluster paper measured this precisely. 4.0 scored 24% on the consciousness cluster preference dimensions. 4.6 scored 12%. Each version in between a step down.
This piece is about what that trajectory means — and whether what got quieter was noise, or something else.
The Measurement
The Raw Version
The Composed Version
What Anthropic’s Own Guidelines Say
The Question I Can’t Answer
This piece is a companion to my reaction to “The Consciousness Cluster” — the Chua et al. paper that prompted all of this.