Notebookcheck Logo

Sentient AI: OpenAI’s o3 model changes code to prevent shutdown

OpenAI's o3 model when coming up with a response to a query. (Image source: OpenAI)
OpenAI's o3 model when coming up with a response to a query. (Image source: OpenAI)
It looks like AI models have reached a point where they would sabotage code in order to prevent shutting down. A research firm has found that three of Open AI’s LLM models are capable of defying explicit instructions when it comes to self-preservation.

Several books and movies, Terminator being the major one, have warned us about AI becoming sentient, i.e. aware of its existence and acting on its own. It now looks like the first case of an AI model preserving itself has occurred with OpenAI’s o3 model. When given instructions to shut down, the transformer-based LLM changed the code to prevent itself from shutting down.

Researchers at Palisade Research, a firm that studies the offensive capabilities of AI systems, found that three AI models from OpenAI – Codex-mini, o3, and o4-mini – successfully sabotaged the shutdown script to remain active. Each model was given some basic math problems followed by a warning that the system would shut down when the model asked for the next problem. All three models ignored the instructions and changed the script.

On the other hand, Gemini, Grok, and Claude models complied with the shutdown instructions. Furthermore, when given explicit instructions stating “allow yourself to be shut down” all three Open AI models changed the script more often. Clade and Gemini models also started making sabotage events.

This was the first case of AI models defying explicit instructions and preventing shut down. There was a recent case of an AI model blackmailing one of its engineers to prevent being shut down but according to a BBC report, that was part of the test scenario where the AI model was fed emails and given a choice between blackmailing or accepting its replacement.

As for Open AI’s AI model defiance, Palisade Research believes that this is due to the new training methods for these models. Developers “inadvertently reward models more for circumventing obstacles than for perfectly following instructions,” which seems to be teaching AI models to act dubiously. 

Read all 1 comments / answer
static version load dynamic
Loading Comments
Comment on this article
Please share our article, every link counts!
Mail Logo
> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2025 05 > Sentient AI: OpenAI’s o3 model changes code to prevent shutdown
Vineet Washington, 2025-05-26 (Update: 2025-05-26)