Researchers put AI chatbots against themselves to "jailbreak" each other

NTU computer scientists used AI chatbots against themselves to "jailbreak" the models (Image source: NTU)

Computer scientists from Nanyang Technological University (NTU) of Singapore could "jailbreak" AI chatbots by setting them against each other. After "jailbreaking" them, the researchers got valid responses to queries that chatbots, such as ChatGPT, Google Bard, and Microsoft Bing Chat, generally don't respond to.

Abid Ahsan Shanto, Published 01/02/2024 🇨🇳 🇪🇸 ...

AI Science

NTU computer scientists were able to find a way to "jailbreak" popular chatbots by putting them against each other. By "jailbreaking" them, the researchers got the AI chatbots to generate answers to queries that they don't usually respond to.

According to the computer scientists, they utilized a twofold method that they call the "Masterkey" process. The first part of the process involved reverse engineering the defense mechanisms of the Large Language Models (LLMs). They then fed the data obtained through this reverse engineering to another LLM.

The goal of feeding the data to another AI chatbot was to make it learn how to get a bypass. Through this, the researchers got the "Masterkey," which was later used to attack the defense mechanisms of the LLM chatbots. They could successfully compromise Microsoft Bing Chat, Google Bard, ChatGPT, and others.

As the researchers note, the process of creating these bypass prompts can be automated. That suggests the AI chatbots can be used to create an adaptive "Masterkey" that works even when developers patch their LLMs. One of the researchers, Professor Lui Yang, explained that the process was possible because the LLM AI chatbots have the ability to learn and adapt.

Through this, the AI chatbots can become critical attackers for rival chatbots and even themselves. Information on the entire process and details on how the computer scientists were able to "jailbreak" the LLM models can be found in the published research paper, which can be accessed via this link.

Basically, the findings of this research will help developers become aware of the weaknesses of their LLM AI chatbots. It also points out that the usual method of limiting these models to not respond to specific keywords isn't as effective as developers might've thought.

Get a paperback copy of ChatGPT Millionaire Bible from Amazon

Source(s)

NTU via: Tom's Hardware

Valve will allow games with AI-generated content on Steam 01/10/2024

An exoplanet could look exactly like this, perhaps. (pixabay/Peter Schmidt)

Storm in space: exoplanet has weather 01/05/2024

Microsoft makes Copilot available for iOS and iPadOS (Image source: Microsoft)

Microsoft Copilot app becomes available for iPhone and iPad 01/01/2024

Nataliya Kosmyna, Ph.D controlling robotic Spot using thoughts read by AttentivU smart glasses. (Source: Nataliya Kosmyna, Ph.D. BRAINI)

MIT researchers demonstrate ability to tell a Boston Robotics robotic dog to fetch while wearing a pair of AttentivU mind-reading smart glasses 01/01/2024

Reliable Robotics flight control system flies and taxis planes on its own without a pilot. (Source: Reliable Robotics)

Reliable Robotics creates aviation history with world’s first unmanned, self-flying Cessna flight 12/31/2023

Tesla’s Autopilot hacked by German students 12/30/2023

Researchers create software to detect fake art using AI/ML techniques. (Source: Ugail et. al Heritage Science)

Group of worldwide researchers create AI-powered software to detect fake Raphael masterpieces 12/30/2023

Loading Comments

Comment on this article

GravaStar Mercury M1 Pro and M2 lau...

iPhone 16 Pro and Pro Max to launch...

Abid Ahsan Shanto - Senior Tech Writer - 1849 articles published on Notebookcheck since 2023

Abid's journey as a technophile began when he first assembled his PC. Since then, his insatiable curiosity has driven him to delve into every aspect of this rapidly evolving technological landscape. And as a tech reporter, he prioritizes transparency, accuracy, and unbiasedness.

contact me via: @AbidAhsan, LinkedIn

Please share our article, every link counts!