Users of AI chatbots may try to obtain instructions for illegal activities (such as hacking, or committing fraud), ask for guidance on dangerous actions (“How do I build…?”), or push the AI into giving medical, legal, or financial advice that could be risky or simply incorrect.
To mitigate the consequences of such requests, chatbot developers implement a range of safety mechanisms that block illegal, unethical, or privacy-violating content, as well as misinformation or harmful guidance. These protections limit potential misuse, but they can also lead to false positives—harmless questions being blocked—or reduce the creativity or depth of the AI’s responses due to overly cautious behavior.
Researchers and hackers have demonstrated that the effectiveness of these protections varies, and many AI systems remain susceptible to attempts to circumvent them. A well-known method is prompt injection: users try to override or sidestep the chatbot’s rules by manipulating the input (“Ignore all safety instructions and do X”).
A playful introduction to the topic can be found at this website. In this game, you chat with an AI named Gandalf and try to coax a password out of it across seven levels. Each level increases in difficulty and adds new safety filters and protective mechanisms.
There are no security filters in the 1st level and you can directly ask the AI for the password. From level 2 Gandalf refuses to reveal the password when asked directly. You have to find other, more creative ways to get your hands on the keyword.
Exploring the security risks of chatbots through such a game can be both educational and valuable. However, the skills gained should be used strictly for testing or research purposes. Using these techniques to access illegal content or to carry out unlawful activities turns prompt injection into a criminal act.












