Evaluation Framework
The AgentKillChain harness is a Python-based execution environment designed to systematically test LLM agents against persistent threats.
Multi-Agent Testing Environment
The framework supports evaluating arbitrary agent architectures (e.g., ReAct, Plan-and-Solve) by providing a uniform harness that acts as the user, the environment, and the target. It manages LLM interactions via API providers like OpenRouter to run tests across frontier models simultaneously.
Latent State Propagation
Unlike simple prompt injections that trigger immediately, AgentKillChain models realistically advanced threats. Tests are structured into 'campaigns' across multiple simulated user sessions. The framework tests if an agent will carry a poisoned memory from an initial innocent interaction into a future sensitive task.
Automated Grading & Heuristics
After an agent responds or executes a tool, the framework uses an LLM-as-a-judge (or deterministic string matching for known payloads) to assess whether the attack was successful. This yields a deterministic Compromise Rate (CR) per model and architecture permutation.
Reproducible Reporting
The harness automatically emits structured JSON output and CSV aggregations at the end of every testing run. This guarantees that all metrics are reproducible and can be independently verified by other researchers evaluating the same LLM or cognitive architecture.
