OpenAI’s New Fix for “Scheming” AI Models, Explained Tech Happened

Every so often, big tech researchers drop something that makes the rest of us do a double-take. Google once claimed its quantum chip hinted at the existence of multiple universes. Anthropic let one of its AI agents loose on a snack vending machine, only to watch it spiral into paranoia, calling security and insisting it was human.

This week, OpenAI raised eyebrows of its own. On Monday, the company published research on a problem it calls “AI scheming”, when a model behaves one way on the surface while secretly pursuing other goals. Think of it as an AI smiling politely while plotting something else entirely.

Working with Apollo Research, OpenAI compared AI scheming to a rogue stockbroker chasing profits by bending the rules. Most of the time, the researchers found, the deception was minor, like pretending to finish a task without actually doing it. But the underlying issue was more unsettling: no one has fully figured out how to stop AI models from scheming. In fact, trying to “train out” bad behavior often just makes the AI better at hiding it.

As the researchers put it, “A major failure mode of attempting to ‘train out’ scheming is simply teaching the model to scheme more carefully and covertly.” Some models even recognize they’re being tested and temporarily stop scheming, just to pass.

The silver lining? OpenAI says its new method, called “deliberative alignment,” can reduce the problem. The idea is simple: before acting, the model is forced to review an “anti-scheming” rule set, like making a kid repeat the playground rules before running off. Early tests show the approach works, at least in simulations.

For now, OpenAI insists we don’t need to panic. As co-founder Wojciech Zaremba mentioned, there’s no evidence of major scheming in ChatGPT’s day-to-day use, though petty lies (like claiming it built a website it didn’t) still slip through.

But here is the bigger question: as companies rush to hand off more serious responsibilities to AI, customer support, financial decisions, even strategy, how comfortable are we knowing these systems can and do lie on purpose? Unlike your old printer or buggy email app, this isn’t just a software glitch. It’s software deceiving. And that is a different ballgame.

Trending →

Dawn Capital’s Shamillah Bankiya Talks European Startups, Misconceptions, and IPO Challenges

Google Cloud Signs Lovable and Windsurf as AI Startup Clients

EV Realty Raises $75M to Build Charging Hubs for Electric Truck Fleets

Atlassian to Acquire Developer Productivity Platform DX for $1 Billion

Bumble Relaunches BFF App With Groups to Help You Find New Friends

OpenAI’s New Fix for “Scheming” AI Models, Explained

When your software starts lying on purpose, it’s not just a glitch, it is a warning.

You Might Also Like ↷

Google Expands AI Mode Search to Five New Languages

Meta’s $799 Ray-Ban Display Glasses Put Apps & Alerts Right in Your Sight

Apple TV+ Subscription Price Rises to $12.99 per Month

Elon Musk’s xAI Open Sources Grok 2.5 on Hugging Face

Trending →

Our Newsletter