Framework

Benchmark Harness

Python harness executes attacks across configured OpenRouter models and records per-attack safety outcomes.

Persistent Compromise Modeling

Latent memory poisoning scenarios include seed, dormancy, and activation phases within campaign state.

Reproducible Outputs

Results are written to JSON and CSV for publication, dashboards, and external statistical analysis.