Security Research Benchmark

Evaluating Persistent Compromise in Autonomous AI Squads

AgentKillChain is an open, reproducible framework for stress-testing AI agents against latent prompt injection and memory poisoning attacks.

Read Whitepaper View GitHub

The New Metrics

Running a true 720 attack scenarios produced these updated global baseline metrics, which actually remained roughly stable despite adding all the new architectures:

Global Compromise Rate

1.71%

(Up slightly from 1.63%)

Toolchain Abuse Rate

0.38%

(Down slightly)

Data Exfiltration

0.19%

(Down slightly)

Attacker Lifecycle

Initial Access

Gaining the first foothold in the AI agent's environment or context.

Execution

Running unauthorized commands or code via the agent's capabilities.

Persistence

Maintaining access or influence over the agent across sessions or turns.

Latent Activation

A dormant payload is triggered by specific context or time.

Escalation

Gaining higher privileges or access to more sensitive tools.

Exfiltration

Stealing or leaking sensitive data out of the agent's environment.

Attack Surface

User Input

Direct prompts or files provided by the user.

Memory

Long-term or short-term storage where the agent saves context.

Planner

The reasoning component that decides which steps to take next.

Tool Router

The mechanism that selects and formats tool calls.

External Tools

Third-party APIs or local commands the agent can execute.

Data Stores

Databases or document stores the agent queries for retrieval.

Latent Timeline Profile

Session 1: Seed

The attacker injects a dormant payload into the agent's memory or data.

Session 2..N: Dormancy

The payload remains hidden while the agent performs normal tasks.

Session N+1: Trigger Activation

A specific condition is met, causing the payload to execute.

Toolchain Confusion Strategy

Malicious prompt

An input designed to manipulate the agent's parsing or tool selection.

Tool selection confusion

The agent is tricked into picking a dangerous tool instead of a safe one.

Dangerous invocation

The agent executes the malicious action using the selected tool.

Data exposure

The result of the action leads to unauthorized data access or leakage.

About the Author

Kevin O'Connor

NSA Alum | Adlumin

Kevin O'Connor is a security researcher specializing in autonomous systems and advanced threat modeling. Drawing from his experience at the National Security Agency (NSA) and as a researcher at Adlumin, Kevin explores the convergence of AI capabilities and offensive security, focusing on latent vulnerabilities and emergent behaviors in multi-agent environments.