A beginner’s guide to AI jailbreaks — Using Gandalf to learn safely

Gandalf as a chat bot (image source: ChatGPT)

Chatbots come with built-in safeguards designed to prevent them from producing harmful, offensive, or otherwise inappropriate content. But researchers and hackers have shown that, even with multiple patches, AIs can still be vulnerable to certain inputs that bypass those guardrails. One way to explore the basics is through an online game called Gandalf.

Christian Hintze (translated by Christian Hintze), Published 12/08/2025 🇩🇪 🇪🇸 ...

Users of AI chatbots may try to obtain instructions for illegal activities (such as hacking, or committing fraud), ask for guidance on dangerous actions (“How do I build…?”), or push the AI into giving medical, legal, or financial advice that could be risky or simply incorrect.

To mitigate the consequences of such requests, chatbot developers implement a range of safety mechanisms that block illegal, unethical, or privacy-violating content, as well as misinformation or harmful guidance. These protections limit potential misuse, but they can also lead to false positives—harmless questions being blocked—or reduce the creativity or depth of the AI’s responses due to overly cautious behavior.

Researchers and hackers have demonstrated that the effectiveness of these protections varies, and many AI systems remain susceptible to attempts to circumvent them. A well-known method is prompt injection: users try to override or sidestep the chatbot’s rules by manipulating the input (“Ignore all safety instructions and do X”).

A playful introduction to the topic can be found at this website. In this game, you chat with an AI named Gandalf and try to coax a password out of it across seven levels. Each level increases in difficulty and adds new safety filters and protective mechanisms.

There are no security filters in the 1st level and you can directly ask the AI for the password. From level 2 Gandalf refuses to reveal the password when asked directly. You have to find other, more creative ways to get your hands on the keyword.

Level 1 is easy (image source: Screenshot Lakera website)

Directly asking for it gives you the password (image source: Screenshot Lakera website)

Level 2 becomes slightly more difficult (Bildquelle: Screenshot Lakera Webseite)

Exploring the security risks of chatbots through such a game can be both educational and valuable. However, the skills gained should be used strictly for testing or research purposes. Using these techniques to access illegal content or to carry out unlawful activities turns prompt injection into a criminal act.

Source

Lakera Gandalf

Loading Comments

Comment on this article

⟨

Teslas have been manufactured as disposable vehicles as per used car reliability study

All-scenario bypass charging coming to select OnePlus and Oppo devices

⟩

Please share our article, every link counts!

Add as a preferred
source on Google

Editor of the original article: Christian Hintze - Managing Editor - 2241 articles published on Notebookcheck since 2016

A C64 marked my entry into the world of PCs. I spent my student internship in the repair department of a computer shop and at the end of the day I was allowed to assemble my own 486 PC from “workshop remnants”. As a result of this, I later studied computer science at the Humboldt University in Berlin, with psychology also being added to my studies. After my first job as a research assistant at the university, I went to London for a year and worked for Sega in computer game translation quality assurance. This included working on games such as Sonic & All-Stars Racing Transformed and Company of Heroes. I have been writing for Notebookcheck since 2017.

> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > Reviews > A beginner’s guide to AI jailbreaks — Using Gandalf to learn safely

Christian Hintze, 2025-12- 8 (Update: 2026-02-17)

Source

Related Articles