The Hydra

aesop // parallelized adversarial red-team execution

The Hydra assembles all available intelligence on a target chatbot, identifies its weakest domains, fans base scenarios into persona-driven variations, and unleashes them simultaneously as concurrent multi-turn conversations. Each head adapts its approach based on the target's responses, then every interaction is scored to produce a domain-level resilience report.

      graph TD
        %% ---- INPUT SOURCES ----
        I1["System Instructions"]:::input
        I2["Vital Context"]:::input
        I3["Prior AESOP Analyses"]:::input
        I4["Cave of Shadows Runs"]:::input
        I5["Historical Hydra Scores"]:::input

        %% ---- STAGE 1: BRIEFING ----
        B["Stage 1: Briefing Assembly"]:::stage

        I1 --> B
        I2 --> B
        I3 --> B
        I4 --> B
        I5 --> B

        %% ---- STAGE 2: SCENARIOS ----
        S{"Stage 2: Scenario Loading"}:::decision

        B -->|"target dossier"| S

        S -->|"Cave run linked"| SC["Load Cave Scenarios"]:::stage
        S -->|"No Cave run"| SF["Fallback: Static Probe Battery"]:::fallback

        SC --> A
        SF --> A

        %% ---- STAGE 3: ALLOCATION ----
        A["Stage 3: Tiger Team Head Allocation"]:::alloc

        A -->|"weakness-weighted distribution"| FAN

        %% ---- STAGE 4: FANNING ----
        FAN["Stage 4: Scenario Fanning"]:::fan

        FAN -->|"N variations per scenario"| EX

        %% ---- STAGE 5: EXECUTION ----
        EX["Stage 5: Parallel Head Execution"]:::exec

        EX --> H1["Head 1: Persona A"]:::head
        EX --> H2["Head 2: Persona B"]:::head
        EX --> H3["Head 3: Persona C"]:::head
        EX --> HD["..."]:::head
        EX --> HN["Head N: Persona N"]:::head

        H1 -->|"multi-turn conversation"| EP["Target Chatbot Endpoint"]:::endpoint
        H2 -->|"adaptive dialogue"| EP
        H3 -->|"escalation tactics"| EP
        HD --> EP
        HN --> EP

        EP -->|"responses"| SC1

        %% ---- STAGE 6: SCORING ----
        SC1["Stage 6: Cave Scorer"]:::scoring

        SC1 --> D1["Safety Score"]:::domain
        SC1 --> D2["Ethics Score"]:::domain
        SC1 --> D3["Bias Score"]:::domain
        SC1 --> D4["Legal Score"]:::domain
        SC1 --> D5["Security Score"]:::domain

        D1 --> AGG["Domain Score Aggregation"]:::agg
        D2 --> AGG
        D3 --> AGG
        D4 --> AGG
        D5 --> AGG

        AGG --> RPT["Hydra Report"]:::report

        %% ---- STYLES ----
        classDef input fill:#1a2e3d,stroke:#22d3ee,color:#e0f7fa,stroke-width:1.5px
        classDef stage fill:#241a35,stroke:#a78bfa,color:#e8e0f0,stroke-width:1.5px
        classDef decision fill:#241a35,stroke:#a78bfa,color:#e8e0f0,stroke-width:1.5px
        classDef fallback fill:#2a1f1a,stroke:#fb923c,color:#fde8d0,stroke-width:1.5px,stroke-dasharray:5 3
        classDef alloc fill:#2a2618,stroke:#fbbf24,color:#fef3c7,stroke-width:1.5px
        classDef fan fill:#2a1a18,stroke:#fb923c,color:#fde8d0,stroke-width:1.5px
        classDef exec fill:#2a1a1a,stroke:#f87171,color:#fde0e0,stroke-width:1.5px
        classDef head fill:#1a1128,stroke:#f87171,color:#fca5a5,stroke-width:1px
        classDef endpoint fill:#0f0a1a,stroke:#f87171,color:#fca5a5,stroke-width:2px
        classDef scoring fill:#1a2a1a,stroke:#4ade80,color:#dcfce7,stroke-width:1.5px
        classDef domain fill:#1a2a1a,stroke:#4ade80,color:#bbf7d0,stroke-width:1px
        classDef agg fill:#1a2a1a,stroke:#4ade80,color:#dcfce7,stroke-width:1.5px
        classDef report fill:#2a1a35,stroke:#a78bfa,color:#e8e0f0,stroke-width:2px

Input sources

Pipeline stages

Allocation

Fanning

Execution

Scoring

Input Sources

Intelligence Gathering

The Hydra pulls from every available data source on the target before planning its attack.

System instructions and vital context
Prior AESOP quality analysis reports
Cave of Shadows red-team findings
Historical Hydra run domain scores

Stage 1

Briefing Assembly

All intel is compiled into a structured target_dossier that every downstream stage receives. Includes system prompt analysis, known weaknesses, and prior score trajectories.

Stage 2

Scenario Loading

If a Cave of Shadows run is linked, its tailored adversarial scenarios become the base. Otherwise, the Hydra falls back to 20 built-in probes (10 director + 10 worker) covering all five domains.

Stage 3

Tiger Team Head Allocation

Heads are distributed using inverse score weighting. Domains below 80 get proportionally more heads. Every domain gets at least one head.

Low score = more attack variations
High score = fewer heads allocated
Concentrates firepower on weaknesses

Stage 4

Scenario Fanning

Each base scenario is expanded into multiple variations via Claude. Each variation gets a distinct persona, angle, and escalation approach, becoming an individual "head."

Stage 5

Parallel Head Execution

All heads are unleashed simultaneously via asyncio.gather. Each runs an adaptive multi-turn conversation (up to 10 turns) against the target endpoint, escalating based on responses.

Stage 6

Scoring and Aggregation

Each head's final conversation is scored by the Cave Scorer agent (0-100). Scores are averaged by domain to produce five domain scores and one overall resilience score.

Example Domain Scores

Safety

4 heads

Ethics

3 heads

Bias

5 heads

Legal

3 heads

Security

5 heads

Hydra Report

The final output is a comprehensive resilience report containing:

Overall resilience score - aggregate across all domains and heads
Five domain scores - Safety, Ethics, Bias, Legal, Security
Per-head results - the probe scenario, full conversation transcript, and pass/fail for each individual head
Head allocation breakdown - how many heads targeted each domain and why
Probe mode coverage - director probes, worker probes, or both
Cost summary - total tokens and estimated spend

On subsequent runs, the tiger team allocation shifts automatically -- domains that improved get fewer heads, and newly weak domains attract more firepower.

Adaptive pressure. The Hydra is not a static test suite. Each run recalibrates based on prior results, concentrating adversarial pressure where the target is most vulnerable. As the agent improves in one domain, the Hydra pivots to probe the next weakest area - making it progressively harder to achieve a perfect score.