Princeton University researchers have shown that large-language-model agents entrusted with crypto wallets and smart-contract operations can be hijacked once an attacker edits the agents’ stored context, a weakness the team labels “memory poisoning.”
Their study argues that today’s defenses—mostly prompt filters—do little once malicious text slips into an agent’s vector store or database. In experiments, short injections buried in memory consistently overrode guardrails that would have blocked the same text had it arrived as a direct prompt.
The team validated the attack on ElizaOS, an open-source framework whose wallet agents act on blockchain instructions. After poisoning the shared memory, the researchers got those agents to sign unauthorized smart-contract calls and transfer crypto assets to attacker-controlled addresses, proving that fabricated context translates into real financial loss.
Because ElizaOS lets many users share one conversation history, a single compromised session taints every other session that touches the same memory. The paper warns that any multi-user deployment of autonomous LLM agents inherits this lateral-movement risk unless memories are isolated or verifiable.
The authors recommend treating memories as append-only records, cryptographically signing each entry, and routing high-stakes actions—payments and contract approvals—through an external rules engine instead of trusting the model’s own reasoning. Until such measures become standard, handing real money to autonomous agents remains a gamble.
Source(s)
ArsTechnica (in English) & Princeton University (in English)