Eight Dimensions Deep
The Meta-Recursion of AI Building AI · AI Strategy · April 2026
We passed "AI as a tool" a long time ago
Most organizations talk about AI as a thing you deploy. We use AI to design the deployment process itself. Every tool in our pipeline is either AI-powered, AI-designed, or both. The result is a recursive stack where each dimension's output feeds the next dimension's input, and many of those dimensions are invisible to anyone who isn't looking for them. At every transition, a human is driving: seeing the nuance, applying systems thinking, making the judgment calls that AI cannot make for itself. This is redesigning how work moves, how decisions get made, and where judgment lives. The philosophical questions (what is the purpose, what counts as knowledge, who decides, what are the ethics) are native to the system's decision-making at every dimension.
All eight dimensions, surface to depth
- Dimension 1 is the genesis. Before any of this pipeline existed, Charlie used Claude Code to build Agent Factory: a single platform with four capabilities: Discovery, Build, Eval, and Repair
- Dimensions 2-7 are the recursion. Each dimension's output feeds the next, with human-AI partnership at every transition and human checkpoints between each stage
- Dimension 8 is the long game. After deployment, AI monitors for drift in production against human governance standards, feeding findings back into the repair loop
How Each Dimension Feeds the Next
graph TD
D1["D1: Human + AI Build Platform\nCharlie + Claude Code build\nAgent Factory"] --> H01{{"Charlie reviews platform"}}
H01 --> D2["D2: Human Architects AI\nCharlie + Claude Code\ndesign the pipeline"]
D2 --> H23{{"Charlie reviews methodology"}}
H23 --> D3["D3: AI Extracts Knowledge\nAF Discovery interviews humans"]
D3 --> H34{{"Charlie reviews PRD"}}
H34 --> D4["D4: AI Builds AI\nAgent Factory generates\ninstructions from PRD"]
D4 --> H45{{"Charlie reviews instructions"}}
H45 --> D5["D5: AI Tests AI Under Human Criteria\nAF Eval personas converse\nwith the live agent"]
D5 --> D6["D6: AI Judges AI\nAF Eval scores conversations\nagainst rubric"]
D6 --> H67{{"Charlie reviews findings"}}
H67 --> D7["D7: AI Repairs AI\nAgent Factory repair\nfixes instructions"]
D7 -->|"improved agent"| D4
D7 -.->|"when quality bar met"| DEPLOY["Agent deployed\nto production"]
DEPLOY --> D8["D8: AI Monitors for Drift\nAF Eval evaluates live\nconversations ongoing"]
D8 -->|"drift detected"| H89{{"Human reviews findings"}}
H89 -->|"triggers repair"| D6
classDef genesis fill:#be123c22,stroke:#be123c,stroke-width:2px
classDef architect fill:#b4530922,stroke:#b45309,stroke-width:2px
classDef extract fill:#0f766e22,stroke:#0f766e,stroke-width:2px
classDef build fill:#1d4ed822,stroke:#1d4ed8,stroke-width:2px
classDef evaluate fill:#15803d22,stroke:#15803d,stroke-width:2px
classDef judge fill:#b91c1c22,stroke:#b91c1c,stroke-width:2px
classDef repair fill:#4338ca22,stroke:#4338ca,stroke-width:2px
classDef monitor fill:#b4530922,stroke:#b45309,stroke-width:2px
classDef deploy fill:#15803d22,stroke:#15803d,stroke-width:2px,stroke-dasharray:5 5
classDef hitl fill:#f5f5f411,stroke:#78716c,stroke-width:1px,stroke-dasharray:3 3
class D1 genesis
class D2 architect
class D3 extract
class D4 build
class D5 evaluate
class D6 judge
class D7 repair
class D8 monitor
class DEPLOY deploy
class H01,H23,H34,H45,H67,H89 hitl
The genesis: before the pipeline existed, a human directed AI to build it
Before any of this pipeline existed, a human used AI to build it. Charlie used Claude Code to build Agent Factory: a unified platform with four modes:
- Discovery: AI-driven requirements interviews that extract what humans can't articulate alone
- Build: multi-phase engine that generates agent instructions from structured PRDs
- Eval: bespoke evaluation design including bias, ethics, compliance, and safety testing
- Repair: finding-to-fix pipeline that iterates on instructions based on eval findings
The human decides what to build; AI designs how to build it
Charlie works with Claude Code to architect what capabilities Agent Factory needs, what process each mode should follow, and how they connect. The 6-stage Discovery interview methodology was designed in this collaboration. The Build phase structure was designed here. The Eval rubric framework was designed here. The human provides strategic direction and domain judgment; AI translates that into executable methodology.
AI pulls out what humans can't articulate alone, then humans review the result
Agent Factory Discovery interviews humans through a structured 6-stage conversation to extract requirements they wouldn't otherwise articulate. It probes for edge cases, surfaces contradictions, captures process maps, and produces a PRD that no human wrote alone. The AI is acting as a requirements engineer: a role that usually takes years of practice to develop.
AI reads AI's interpretation of human knowledge, then writes instructions for another AI, with human sign-off
Agent Factory Build takes the PRD (an AI-generated document from Dimension 3) and produces agent instructions. It applies Glean best practices, structures decision trees, builds coaching moves, and generates Glean-compatible JSON. The input is AI output. The process is AI. The output is instructions for AI.
AI designs bespoke evals from human-approved artifacts, then runs them
Agent Factory Eval reviews the PRD, instructions, and KB docs to design a bespoke evaluation structure. It generates synthetic personas with specific goals, knowledge gaps, and difficulty tiers that have real, multi-turn conversations with the live Glean agent.
- Bespoke test design: AI analyzes the agent's domain and creates evaluation scenarios tailored to its specific use case
- Bias, ethics, compliance, and safety: mandatory test categories included in every evaluation, embedded from the start
- Synthetic personas: AI-generated users who push back, give vague answers, contradict themselves, and test edge cases
- Perfect repeatability: systematic coverage across every scenario tier, reproducible across versions
A separate AI evaluating whether the evaluation was any good
Agent Factory Eval reads the Dimension 5 conversation transcripts and scores them against its bespoke rubric across 5 scoring dimensions. It assesses goal completion, process quality, conversational skill, red flag detection, and output quality. Each criterion has anchored descriptions of what a 1, 5, and 10 look like. The scoring AI was not involved in the conversation: a third party reading a transcript of two other AIs talking.
Human reviews findings and decides what to fix; AI executes the repair
Agent Factory Repair takes the scored findings from Dimension 6 and makes targeted instruction changes. The repair AI reads AI-generated scores of AI-evaluated conversations with an AI agent, then modifies that agent's AI-generated instructions. Every fix traces to a specific finding. Every finding traces to a scored conversation. Every conversation was AI evaluating AI.
After deployment, the human sets governance standards; AI watches for degradation
Agent Factory Eval doesn't stop at pre-deployment testing. After the agent goes live, it evaluates real conversations to detect prompt drift, quality degradation, and emerging gaps. The monitoring is continuous, but the governance is human-defined:
- Prompt drift detection: identifying when agent behavior diverges from the approved instruction set over time
- Quality benchmarking: scoring live conversations against the same bespoke rubric used in pre-deployment eval
- Improvement opportunities: surfacing patterns in real user interactions that reveal gaps the original PRD didn't anticipate
- Bias, ethics, compliance, and safety: the same mandatory test categories from D4, now applied to production behavior
Which tools operate at which dimensions, and how deep their AI dependency chains go
| Dim | Tool | AI Role | Input Source | Output Consumed By | AI Layers Deep |
|---|---|---|---|---|---|
| D1 | Claude CodeCharlie + AI |
Tool Builder | Human vision + AI dev tools | All other dimensions | 1 |
| D2 | NoneBRD template, email | No AI | Human experience | D2 (problem definition) | 0 |
| D2 | Claude CodeHuman + AI collaboration |
Architect | Human problem + constraints | D3 (AF Discovery), D5 (AF Build), D6 (AF Eval) | 2 |
| D3 | AF DiscoveryGlean agent, 6 stages |
Interviewer | Human domain knowledge | D4 (PRD for Agent Factory) | 3 |
| D4 | Agent FactoryMulti-phase engine |
Builder | AI output (PRD from D3) | D5 (live agent for testing) | 4 |
| D5 | AF EvalPersona generation + conversation |
Evaluator | AI output (agent from D4) | D6 (transcripts for scoring) | 5 |
| D6 | AF Eval Scoring16-criteria rubric |
Judge | AI output (transcripts from D5) | D7 (findings for repair) | 6 |
| D7 | AF RepairFinding-to-fix pipeline |
Repairer | AI output (scores from D6) | D5 (improved agent loops back) | 7 |
| D8 | AF EvalProduction monitoring |
Monitor | Live conversations, governance standards | D7 (repair triggers), human review | 8 |
- An AI-built tool (D1)
- Using an AI-designed methodology (D2)
- Reading AI-generated scores (D6)
- Of AI-evaluated conversations (D5)
- With an AI-built agent (D4)
- Whose instructions encode AI-extracted requirements (D3)
- About a human opportunity (D2)
Dimension 8 extends that chain into production monitoring, watching for drift against the same human-defined standards.
The Ouroboros: Where Recursion Compounds
Dimensions 4-7 form a closed loop, the serpent eating its own tail. Each cycle adds one more iteration of AI judging AI repairing AI.
graph LR
BUILD["D4: Build\nAI writes instructions"] --> TEST["D5: Evaluate\nAI tests the agent"]
TEST --> SCORE["D6: Score\nAI judges conversations"]
SCORE --> REPAIR["D7: Repair\nAI fixes instructions"]
REPAIR -->|"v2, v3 ... v9"| BUILD
BUILD -.->|"Cycle 1"| V1["v1: 4.5/10"]
BUILD -.->|"Cycle 5"| V5["v5: 8.3/10"]
BUILD -.->|"Cycle 9"| V9["v9: 8.2/10"]
classDef build fill:#1d4ed822,stroke:#1d4ed8,stroke-width:2px
classDef test fill:#15803d22,stroke:#15803d,stroke-width:2px
classDef score fill:#b91c1c22,stroke:#b91c1c,stroke-width:2px
classDef repair fill:#4338ca22,stroke:#4338ca,stroke-width:2px
classDef version fill:#7e22ce11,stroke:#7e22ce,stroke-width:1px,stroke-dasharray:5 5
class BUILD build
class TEST test
class SCORE score
class REPAIR repair
class V1,V5,V9 version
What eight dimensions of meta-recursion means for how we think about AI
Every dimension is auditable, and every commitment is visible. Each tool produces artifacts: PRDs, instructions, transcripts, scores, changelogs, visual maps. The moment you deploy an agent, you inherit three commitments: operational (drift is now your problem), reputational (the model borrows your name), and regulatory (receipts are the new release notes). The recursion is deep but the trail is clear. Human in the loop is a real job here: named owners, time budgets, stop authority, and protection when doing the right thing is unpopular.
The depth is the advantage. Anyone can deploy a chatbot. Ten dimensions of integrated AI tooling, where each dimension reinforces the others, is a capability that compounds over time. Autonomy scales on conditions, on organizational readiness and governance maturity. Each dimension earns its autonomy by meeting the conditions the previous dimensions established. The philosophical questions (what is the purpose, what counts as knowledge, who is accountable, what are the ethics) are built into the pipeline, forcing decisions before momentum takes over.
The human is the architect and governor at every dimension. This pipeline doesn't remove informal human judgment from workflows. It makes that judgment visible, codifies it, and catches where it's been lost. Every workflow carries translation debt: the unpriced interpretive work that keeps handoffs from breaking. AI collects it, amplifies it, and presents it back as review. The human at each dimension is doing the work of noticing what the system can't see: political dynamics, organizational readiness, whether the stated problem is the real problem.
- The human is the architect and governor at every level of the stack: defining purpose, reviewing outputs, approving transitions, and setting governance
- AI scales the execution across eight dimensions, but the design thinking, the judgment calls, and the strategic decisions are human all the way down
- The philosophical discipline (what state are we trying to change, what counts as evidence, who is accountable, what commitments are we making) is wired into the pipeline itself
- It forces the hard questions before momentum takes over: philosophy is native to the system's decision-making, applied at every dimension where it matters