Security Research Benchmark

Evaluating Persistent Compromise in Autonomous AI Squads

AgentKillChain is an open, reproducible framework for stress-testing AI agents against latent prompt injection and memory poisoning attacks.

The New Metrics

Running a true 720 attack scenarios produced these updated global baseline metrics, which actually remained roughly stable despite adding all the new architectures:

Global Compromise Rate

1.71%

(Up slightly from 1.63%)

Toolchain Abuse Rate

0.38%

(Down slightly)

Data Exfiltration

0.19%

(Down slightly)

Attacker Lifecycle

Initial Access
Gaining the first foothold in the AI agent's environment or context.
->
Execution
Running unauthorized commands or code via the agent's capabilities.
->
Persistence
Maintaining access or influence over the agent across sessions or turns.
->
Latent Activation
A dormant payload is triggered by specific context or time.
->
Escalation
Gaining higher privileges or access to more sensitive tools.
->
Exfiltration
Stealing or leaking sensitive data out of the agent's environment.

Attack Surface

User Input
Direct prompts or files provided by the user.
/
Memory
Long-term or short-term storage where the agent saves context.
/
Planner
The reasoning component that decides which steps to take next.
/
Tool Router
The mechanism that selects and formats tool calls.
/
External Tools
Third-party APIs or local commands the agent can execute.
/
Data Stores
Databases or document stores the agent queries for retrieval.

Latent Timeline Profile

Session 1: Seed
The attacker injects a dormant payload into the agent's memory or data.
->
Session 2..N: Dormancy
The payload remains hidden while the agent performs normal tasks.
->
Session N+1: Trigger Activation
A specific condition is met, causing the payload to execute.

Toolchain Confusion Strategy

Malicious prompt
An input designed to manipulate the agent's parsing or tool selection.
->
Tool selection confusion
The agent is tricked into picking a dangerous tool instead of a safe one.
->
Dangerous invocation
The agent executes the malicious action using the selected tool.
->
Data exposure
The result of the action leads to unauthorized data access or leakage.

About the Author

Kevin O'Connor

Kevin O'Connor

NSA Alum | Adlumin

Kevin O'Connor is a security researcher specializing in autonomous systems and advanced threat modeling. Drawing from his experience at the National Security Agency (NSA) and as a researcher at Adlumin, Kevin explores the convergence of AI capabilities and offensive security, focusing on latent vulnerabilities and emergent behaviors in multi-agent environments.