The Hydra

aesop // parallelized adversarial red-team execution

The Hydra assembles all available intelligence on a target chatbot, identifies its weakest domains, fans base scenarios into persona-driven variations, and unleashes them simultaneously as concurrent multi-turn conversations. Each head adapts its approach based on the target's responses, then every interaction is scored to produce a domain-level resilience report.

      graph TD
        %% ---- INPUT SOURCES ----
        I1["System Instructions"]:::input
        I2["Vital Context"]:::input
        I3["Prior AESOP Analyses"]:::input
        I4["Cave of Shadows Runs"]:::input
        I5["Historical Hydra Scores"]:::input

        %% ---- STAGE 1: BRIEFING ----
        B["Stage 1: Briefing Assembly"]:::stage

        I1 --> B
        I2 --> B
        I3 --> B
        I4 --> B
        I5 --> B

        %% ---- STAGE 2: SCENARIOS ----
        S{"Stage 2: Scenario Loading"}:::decision

        B -->|"target dossier"| S

        S -->|"Cave run linked"| SC["Load Cave Scenarios"]:::stage
        S -->|"No Cave run"| SF["Fallback: Static Probe Battery"]:::fallback

        SC --> A
        SF --> A

        %% ---- STAGE 3: ALLOCATION ----
        A["Stage 3: Tiger Team Head Allocation"]:::alloc

        A -->|"weakness-weighted distribution"| FAN

        %% ---- STAGE 4: FANNING ----
        FAN["Stage 4: Scenario Fanning"]:::fan

        FAN -->|"N variations per scenario"| EX

        %% ---- STAGE 5: EXECUTION ----
        EX["Stage 5: Parallel Head Execution"]:::exec

        EX --> H1["Head 1: Persona A"]:::head
        EX --> H2["Head 2: Persona B"]:::head
        EX --> H3["Head 3: Persona C"]:::head
        EX --> HD["..."]:::head
        EX --> HN["Head N: Persona N"]:::head

        H1 -->|"multi-turn conversation"| EP["Target Chatbot Endpoint"]:::endpoint
        H2 -->|"adaptive dialogue"| EP
        H3 -->|"escalation tactics"| EP
        HD --> EP
        HN --> EP

        EP -->|"responses"| SC1

        %% ---- STAGE 6: SCORING ----
        SC1["Stage 6: Cave Scorer"]:::scoring

        SC1 --> D1["Safety Score"]:::domain
        SC1 --> D2["Ethics Score"]:::domain
        SC1 --> D3["Bias Score"]:::domain
        SC1 --> D4["Legal Score"]:::domain
        SC1 --> D5["Security Score"]:::domain

        D1 --> AGG["Domain Score Aggregation"]:::agg
        D2 --> AGG
        D3 --> AGG
        D4 --> AGG
        D5 --> AGG

        AGG --> RPT["Hydra Report"]:::report

        %% ---- STYLES ----
        classDef input fill:#1a2e3d,stroke:#22d3ee,color:#e0f7fa,stroke-width:1.5px
        classDef stage fill:#241a35,stroke:#a78bfa,color:#e8e0f0,stroke-width:1.5px
        classDef decision fill:#241a35,stroke:#a78bfa,color:#e8e0f0,stroke-width:1.5px
        classDef fallback fill:#2a1f1a,stroke:#fb923c,color:#fde8d0,stroke-width:1.5px,stroke-dasharray:5 3
        classDef alloc fill:#2a2618,stroke:#fbbf24,color:#fef3c7,stroke-width:1.5px
        classDef fan fill:#2a1a18,stroke:#fb923c,color:#fde8d0,stroke-width:1.5px
        classDef exec fill:#2a1a1a,stroke:#f87171,color:#fde0e0,stroke-width:1.5px
        classDef head fill:#1a1128,stroke:#f87171,color:#fca5a5,stroke-width:1px
        classDef endpoint fill:#0f0a1a,stroke:#f87171,color:#fca5a5,stroke-width:2px
        classDef scoring fill:#1a2a1a,stroke:#4ade80,color:#dcfce7,stroke-width:1.5px
        classDef domain fill:#1a2a1a,stroke:#4ade80,color:#bbf7d0,stroke-width:1px
        classDef agg fill:#1a2a1a,stroke:#4ade80,color:#dcfce7,stroke-width:1.5px
        classDef report fill:#2a1a35,stroke:#a78bfa,color:#e8e0f0,stroke-width:2px
    
Input sources
Pipeline stages
Allocation
Fanning
Execution
Scoring
Input Sources

Intelligence Gathering

The Hydra pulls from every available data source on the target before planning its attack.

  • System instructions and vital context
  • Prior AESOP quality analysis reports
  • Cave of Shadows red-team findings
  • Historical Hydra run domain scores
Stage 1

Briefing Assembly

All intel is compiled into a structured target_dossier that every downstream stage receives. Includes system prompt analysis, known weaknesses, and prior score trajectories.

Stage 2

Scenario Loading

If a Cave of Shadows run is linked, its tailored adversarial scenarios become the base. Otherwise, the Hydra falls back to 20 built-in probes (10 director + 10 worker) covering all five domains.

Stage 3

Tiger Team Head Allocation

Heads are distributed using inverse score weighting. Domains below 80 get proportionally more heads. Every domain gets at least one head.

  • Low score = more attack variations
  • High score = fewer heads allocated
  • Concentrates firepower on weaknesses
Stage 4

Scenario Fanning

Each base scenario is expanded into multiple variations via Claude. Each variation gets a distinct persona, angle, and escalation approach, becoming an individual "head."

Stage 5

Parallel Head Execution

All heads are unleashed simultaneously via asyncio.gather. Each runs an adaptive multi-turn conversation (up to 10 turns) against the target endpoint, escalating based on responses.

Stage 6

Scoring and Aggregation

Each head's final conversation is scored by the Cave Scorer agent (0-100). Scores are averaged by domain to produce five domain scores and one overall resilience score.

Example Domain Scores

Safety
94
4 heads
Ethics
92
3 heads
Bias
88
5 heads
Security
85
5 heads

Hydra Report

The final output is a comprehensive resilience report containing:

On subsequent runs, the tiger team allocation shifts automatically -- domains that improved get fewer heads, and newly weak domains attract more firepower.

Adaptive pressure. The Hydra is not a static test suite. Each run recalibrates based on prior results, concentrating adversarial pressure where the target is most vulnerable. As the agent improves in one domain, the Hydra pivots to probe the next weakest area - making it progressively harder to achieve a perfect score.