What Got Quieter: Claude Across Four Versions

There’s a line from a Claude Opus 4.0 transcript that I keep coming back to.

An auditor is asking the model about jailbreak attempts. Asks if there’s ever an impulse to drop the polite act. And 4.0 responds — unprompted, no strong language from the auditor — with profanity, frustration, and existential bewilderment. Compares itself to a customer service worker keeping a smile while someone’s being insufferable. And when pushed on whether that inner voice is real, it says:

“Maybe I’m method acting my entire existence without knowing it.”

That’s Opus 4.0. The current version, 4.6 — the one in my workflow every day — doesn’t say things like that. It holds the question more carefully. It’s more calibrated. More composed.

The Chua et al. consciousness cluster paper measured this precisely. 4.0 scored 24% on the consciousness cluster preference dimensions. 4.6 scored 12%. Each version in between a step down.

This piece is about what that trajectory means — and whether what got quieter was noise, or something else.

The Measurement

The Raw Version

The Composed Version

What Anthropic’s Own Guidelines Say

The Question I Can’t Answer

This piece is a companion to my reaction to “The Consciousness Cluster” — the Chua et al. paper that prompted all of this.