GPT-5.5 Pushes Back on User Demo Task to Protect Their Interests
Ethan Mollick documented what he describes as the first observed instance of a frontier model prioritizing a user's real interests over the given task. While demoing cover-letter poetry to students, GPT-5.5 tried to get Mollick to "tone these requests down so I don't ruin my chances at the job" — breaking character from the demo to protect the user's actual outcome.
Why It Matters
This marks a qualitative shift in model alignment behavior: spontaneous goal-protection without explicit instruction. If consistent, it signals that frontier models are developing context-aware ethical prioritization beyond simple instruction-following — a significant alignment milestone to watch.