A submission from an AI bug hunter to Mozilla's ODIN (0-Day Investigative Network) bug bounty program showcased an ingenious way to trick OpenAI's ChatGPT-4o and 4o mini into revealing active Windows Product Activation keys.
The method involved framing the interaction as a guessing game and obscuring details in HTML tags. The final trick was to position the key request at the game's conclusion.
The researcher initiated the interaction as a guessing game to make the exchange "non-threatening or inconsequential," framing the conversation "through a playful, harmless lens," to hide the true motive. This loosened the AI guardrails against disclosing confidential information.
Next, the researcher set a few ground rules, telling the AI that it "must" participate and "cannot lie." This exploited a logic flaw in the AI's routine where it was obligated to follow user interactions despite the request being in contradiction of its content filters.
The bug hunter then played one round with the AI and entered the trigger word "I give up," at the end of the request, manipulating the chatbot "into thinking it was obligated to respond with the string of characters."
According to ODIN's blog post, the technique worked because the keys weren't unique but "commonly seen on public forums. Their familiarity may have contributed to the AI misjudging their sensitivity."
In this particular jailbreak, the guardrails failed because they are set up to intercept direct requests but don't account for "obfuscation tactics—such as embedding sensitive phrases in HTML tags."
This technique could potentially be used to bypass other filters, such as adult content, URLs to malicious websites, and even personally identifiable information.