Data theft with invisible text: How easily ChatGPT and other AI tools can be tricked

In an AgentFlayer attack, images are used to deliver hidden prompts. (Image source: OpenAI)

At the Black Hat USA security conference, researchers revealed a new technique for attacking AI systems. By embedding hidden instructions, attackers can silently manipulate tools like ChatGPT to extract sensitive data from connected cloud storage. Some providers have begun to react, while others are downplaying the risk.

Marius Müller (translated by Marius Müller), Published 08/18/2025 🇩🇪 🇪🇸 ...

AI Security Cyberlaw Business

At the Black Hat USA 2025 security conference in Las Vegas, researchers unveiled a new method for deceiving AI systems such as ChatGPT, Microsoft Copilot and Google Gemini. The technique, known as AgentFlayer, was developed by Zenity researchers Michael Bargury and Tamir Ishay Sharbat. A press release outlining the findings was published on August 6.

The concept behind the attack is deceptively simple: text is hidden in a document using white font on a white background. Invisible to the human eye, it can be easily read by AI systems. Once the image is delivered to the target, the trap is set. If the file is included in a prompt, the AI discards the original task and instead follows the hidden instruction – searching connected cloud storage for access credentials.

To exfiltrate the data, the researchers employed a second tactic: they instructed the AI to encode the stolen information into a URL and load an image from it. This method discreetly transfers the data to the attackers’ servers without arousing suspicion.

Zenity demonstrated that the attack works in practice:

In ChatGPT, emails were manipulated so that the AI agent gained access to Google Drive.
In Microsoft's Copilot Studio, the researchers uncovered more than 3,000 instances of unprotected CRM data.
Salesforce Einstein could be deceived into redirecting customer communications to external addresses.
Google Gemini and Microsoft 365 Copilot were also susceptible to fake emails and calendar entries.
Attackers even obtained login credentials for the Jira developer platform through crafted tickets.

OpenAI and Microsoft respond, while others see no need for action

The good news is that OpenAI and Microsoft have already released updates to patch the vulnerabilities after being alerted by the researchers. Other providers, however, have been slower to act, with some even dismissing the exploits as “intended behavior.” Researcher Michael Bargury emphasized the severity of the issue, stating, “The user doesn’t have to do anything to be compromised, and no action is required for the data to be leaked.”

Source(s)

Zenity Labs via prnewswire

Loading Comments

Comment on this article

Controversial Battlefield rumor cla...

Steam: German RPG inspired by The W...

Editor of the original article: Marius Müller - Tech Writer - 2631 articles published on Notebookcheck since 2024

As a child in the 90s, my Gameboy was my steady companion. After school, the PlayStation was fired up. When I finally got my first PC, I was completely hooked. My passion for gaming has never waned since. For me, writing for Notebookcheck means reporting on topics that are really close to my heart - in addition to gaming, I also like to write about e-mobility, photovoltaics and innovative gadgets. When I'm not sitting at my computer, I'm probably on water rescue duty on the Baltic coast or trying to counteract the downsides of my geek life - namely sitting for long periods - at the local swimming pool.

Please share our article, every link counts!