Notebookcheck Logo

Researchers put AI chatbots against themselves to "jailbreak" each other

NTU computer scientists used AI chatbots against themselves to "jailbreak" the models (Image source: NTU)
NTU computer scientists used AI chatbots against themselves to "jailbreak" the models (Image source: NTU)
Computer scientists from Nanyang Technological University (NTU) of Singapore could "jailbreak" AI chatbots by setting them against each other. After "jailbreaking" them, the researchers got valid responses to queries that chatbots, such as ChatGPT, Google Bard, and Microsoft Bing Chat, generally don't respond to.

NTU computer scientists were able to find a way to "jailbreak" popular chatbots by putting them against each other. By "jailbreaking" them, the researchers got the AI chatbots to generate answers to queries that they don't usually respond to.

According to the computer scientists, they utilized a twofold method that they call the "Masterkey" process. The first part of the process involved reverse engineering the defense mechanisms of the Large Language Models (LLMs). They then fed the data obtained through this reverse engineering to another LLM.

The goal of feeding the data to another AI chatbot was to make it learn how to get a bypass. Through this, the researchers got the "Masterkey," which was later used to attack the defense mechanisms of the LLM chatbots. They could successfully compromise Microsoft Bing Chat, Google Bard, ChatGPT, and others.

As the researchers note, the process of creating these bypass prompts can be automated. That suggests the AI chatbots can be used to create an adaptive "Masterkey" that works even when developers patch their LLMs. One of the researchers, Professor Lui Yang, explained that the process was possible because the LLM AI chatbots have the ability to learn and adapt.

Through this, the AI chatbots can become critical attackers for rival chatbots and even themselves. Information on the entire process and details on how the computer scientists were able to "jailbreak" the LLM models can be found in the published research paper, which can be accessed via this link.

Basically, the findings of this research will help developers become aware of the weaknesses of their LLM AI chatbots. It also points out that the usual method of limiting these models to not respond to specific keywords isn't as effective as developers might've thought.

Get a paperback copy of ChatGPT Millionaire Bible from Amazon

Source(s)

static version load dynamic
Loading Comments
Comment on this article
Please share our article, every link counts!
> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2024 01 > Researchers put AI chatbots against themselves to "jailbreak" each other
Abid Ahsan Shanto, 2024-01- 2 (Update: 2024-01- 2)