Eight Dimensions Deep

The Meta-Recursion of AI Building AI · AI Strategy · April 2026

When we build an AI agent, we're operating across multiple simultaneous dimensions of human-AI partnership. From a human using AI to build the platform, through directing AI to architect pipelines, discover requirements, generate instructions, design bespoke evaluations, score quality, repair itself, and monitor for drift in production. The recursion runs eight dimensions deep, and the human is the architect and governor at every turn.
The sentence: A human used AI to build a platform that uses AI to architect pipelines where AI interviews humans to extract requirements that AI converts into instructions for an AI agent that AI evaluates, AI scores, AI repairs, and AI monitors in production.
This isn't a thought experiment. Every dimension described here is a real capability of Agent Factory: a single platform with four integrated functions: Discovery, Build, Eval, and Repair. BuRDy, our Business Requirements Document interview agent, went through all of them.
The Premise

We passed "AI as a tool" a long time ago

Most organizations talk about AI as a thing you deploy. We use AI to design the deployment process itself. Every tool in our pipeline is either AI-powered, AI-designed, or both. The result is a recursive stack where each dimension's output feeds the next dimension's input, and many of those dimensions are invisible to anyone who isn't looking for them. At every transition, a human is driving: seeing the nuance, applying systems thinking, making the judgment calls that AI cannot make for itself. This is redesigning how work moves, how decisions get made, and where judgment lives. The philosophical questions (what is the purpose, what counts as knowledge, who decides, what are the ethics) are native to the system's decision-making at every dimension.

8
Dimensions
5
AI Tools in Stack
4
Dimensions Where AI Talks to AI
2
Humans in the Loop
The Stack

All eight dimensions, surface to depth

The Human Opportunity: Employees need help articulating business requirements. They face a blank 15-page BRD template and either submit incomplete documents or skip the process entirely. This is the context that justifies everything below: the reality the dimensions exist to address.
1
Human Uses AI to Build the Platform
Charlie uses Claude Code to build Agent Factory: a unified platform with Discovery, Build, Eval, and Repair.
Claude Code
2
Human Uses AI to Architect AI
Charlie directs AI to design the pipeline, methodology, and quality standards
Charlie Claude Code
3
Human Knowledge via AI Interviewer
Agent Factory Discovery interviews humans to extract what they can't articulate alone
AF Discovery
4
Human-Reviewed PRD Becomes AI Instructions
Agent Factory Build generates instructions from the human-approved PRD
AF Build
5
Human-Defined Standards, AI-Driven Evaluation
Agent Factory Eval reviews PRD, instructions, and KB to design bespoke evals, always including bias, ethics, compliance, and safety
AF Eval
6
AI Judges Against Human-Set Standards
Agent Factory Eval scores conversations against the bespoke rubric the human defined
AF Eval Scoring
7
Human Approves, AI Repairs
Agent Factory Repair iterates on instructions after human reviews/refines eval findings
AF Repair
8
Human Governance, AI Monitors for Drift
Agent Factory Eval monitors live agent conversations for prompt drift, quality degradation, and improvement opportunities, guided by human governance standards
AF Eval
  • Dimension 1 is the genesis. Before any of this pipeline existed, Charlie used Claude Code to build Agent Factory: a single platform with four capabilities: Discovery, Build, Eval, and Repair
  • Dimensions 2-7 are the recursion. Each dimension's output feeds the next, with human-AI partnership at every transition and human checkpoints between each stage
  • Dimension 8 is the long game. After deployment, AI monitors for drift in production against human governance standards, feeding findings back into the repair loop

How Each Dimension Feeds the Next

        graph TD
          D1["D1: Human + AI Build Platform\nCharlie + Claude Code build\nAgent Factory"] --> H01{{"Charlie reviews platform"}}
          H01 --> D2["D2: Human Architects AI\nCharlie + Claude Code\ndesign the pipeline"]
          D2 --> H23{{"Charlie reviews methodology"}}
          H23 --> D3["D3: AI Extracts Knowledge\nAF Discovery interviews humans"]
          D3 --> H34{{"Charlie reviews PRD"}}
          H34 --> D4["D4: AI Builds AI\nAgent Factory generates\ninstructions from PRD"]
          D4 --> H45{{"Charlie reviews instructions"}}
          H45 --> D5["D5: AI Tests AI Under Human Criteria\nAF Eval personas converse\nwith the live agent"]
          D5 --> D6["D6: AI Judges AI\nAF Eval scores conversations\nagainst rubric"]
          D6 --> H67{{"Charlie reviews findings"}}
          H67 --> D7["D7: AI Repairs AI\nAgent Factory repair\nfixes instructions"]
          D7 -->|"improved agent"| D4
          D7 -.->|"when quality bar met"| DEPLOY["Agent deployed\nto production"]
          DEPLOY --> D8["D8: AI Monitors for Drift\nAF Eval evaluates live\nconversations ongoing"]
          D8 -->|"drift detected"| H89{{"Human reviews findings"}}
          H89 -->|"triggers repair"| D6

          classDef genesis fill:#be123c22,stroke:#be123c,stroke-width:2px
          classDef architect fill:#b4530922,stroke:#b45309,stroke-width:2px
          classDef extract fill:#0f766e22,stroke:#0f766e,stroke-width:2px
          classDef build fill:#1d4ed822,stroke:#1d4ed8,stroke-width:2px
          classDef evaluate fill:#15803d22,stroke:#15803d,stroke-width:2px
          classDef judge fill:#b91c1c22,stroke:#b91c1c,stroke-width:2px
          classDef repair fill:#4338ca22,stroke:#4338ca,stroke-width:2px
          classDef monitor fill:#b4530922,stroke:#b45309,stroke-width:2px
          classDef deploy fill:#15803d22,stroke:#15803d,stroke-width:2px,stroke-dasharray:5 5
          classDef hitl fill:#f5f5f411,stroke:#78716c,stroke-width:1px,stroke-dasharray:3 3

          class D1 genesis
          class D2 architect
          class D3 extract
          class D4 build
          class D5 evaluate
          class D6 judge
          class D7 repair
          class D8 monitor
          class DEPLOY deploy
          class H01,H23,H34,H45,H67,H89 hitl
      
Dimension 1 - Human Uses AI to Build the Platform

The genesis: before the pipeline existed, a human directed AI to build it

D1
The Genesis Dimension
Claude Code
Actors: Charlie + Claude Code Tools: Claude Code (agentic AI development)

Before any of this pipeline existed, a human used AI to build it. Charlie used Claude Code to build Agent Factory: a unified platform with four modes:

  • Discovery: AI-driven requirements interviews that extract what humans can't articulate alone
  • Build: multi-phase engine that generates agent instructions from structured PRDs
  • Eval: bespoke evaluation design including bias, ethics, compliance, and safety testing
  • Repair: finding-to-fix pipeline that iterates on instructions based on eval findings
Input Vision for AI build process, domain expertise, AI development tools
Output Agent Factory (Discovery, Build, Eval, Repair): the platform that does everything else
What makes this Dimension 1 / The human role This is the bootstrap. You can't use AI to build AI agents until you've used AI to build the tools that build AI agents. The human applies design thinking to decide what tools need to exist, how they should compose, and what quality standards to embed. AI writes the code, but the human sees the whole system: the dependencies, the feedback loops, the gaps. Everything downstream inherits these architectural choices.
Dimension 2 - Human Uses AI to Architect AI

The human decides what to build; AI designs how to build it

D2
Human Uses AI to Architect AI
Charlie Claude Code
Actors: Charlie + Claude Code Method: iterative design conversations, methodology creation

Charlie works with Claude Code to architect what capabilities Agent Factory needs, what process each mode should follow, and how they connect. The 6-stage Discovery interview methodology was designed in this collaboration. The Build phase structure was designed here. The Eval rubric framework was designed here. The human provides strategic direction and domain judgment; AI translates that into executable methodology.

Input Human problem description, domain constraints, strategic direction
Output Methodology designs, pipeline specifications, tool architectures
What makes this meta / The human role The human is directing AI to design the methodology that other AI tools will follow. The human brings systems thinking: seeing how the interview stages connect to the build phases connect to the eval criteria. AI can generate each piece, but the human sees the whole pipeline and ensures the pieces compose. This is where strategy becomes architecture.
Dimension 3 - Human Knowledge via AI Interviewer

AI pulls out what humans can't articulate alone, then humans review the result

D3
AI as Skilled Interviewer
AF Discovery
Actors: Agent Factory Discovery (Glean agent) + subject matter experts Method: 6-stage requirements interview

Agent Factory Discovery interviews humans through a structured 6-stage conversation to extract requirements they wouldn't otherwise articulate. It probes for edge cases, surfaces contradictions, captures process maps, and produces a PRD that no human wrote alone. The AI is acting as a requirements engineer: a role that usually takes years of practice to develop.

Input Human domain knowledge (unstructured, in someone's head)
Output Structured PRD + process map
What makes this meta / The human role The tool doing the interviewing was designed by Dimension 2's human-AI collaboration, and built by Dimension 1. The interview methodology itself is an AI artifact. The human reviews every PRD that comes out, checking for nuance that AI can extract but can't judge: political dynamics, organizational readiness, whether the stated problem is the real problem. The human checkpoint between D3 and D4 is where design thinking catches what structured interviews miss.
Dimension 4 - Human-Reviewed PRD Becomes AI Instructions

AI reads AI's interpretation of human knowledge, then writes instructions for another AI, with human sign-off

D4
AI Builds AI Instructions
Agent Factory
Actors: Agent Factory engine (multi-phase pipeline) Method: PRD analysis, knowledge base extraction, instruction generation

Agent Factory Build takes the PRD (an AI-generated document from Dimension 3) and produces agent instructions. It applies Glean best practices, structures decision trees, builds coaching moves, and generates Glean-compatible JSON. The input is AI output. The process is AI. The output is instructions for AI.

Input PRD (written by AI in D3), knowledge base, best practices
Output Agent instructions (Glean JSON), decision trees, coaching prompts
What makes this meta / The human role Four dimensions of AI are stacked: D1 built Agent Factory. D2 designed its methodology. D3 produced the PRD. D4 reads that PRD and writes instructions for yet another AI (the Glean agent). The human reviews the generated instructions before they go live, catching edge cases in the coaching logic, validating that the decision tree maps to real business scenarios, and ensuring the agent's personality matches the brand. AI generates; the human validates against reality.
Dimension 5 - Human-Defined Standards, AI-Driven Evaluation

AI designs bespoke evals from human-approved artifacts, then runs them

D5
AI Tests AI Under Human Criteria
Agent Factory Eval
Actors: AF Eval-generated personas + live Glean agent Method: goal-driven multi-turn conversations (15-50 turns)

Agent Factory Eval reviews the PRD, instructions, and KB docs to design a bespoke evaluation structure. It generates synthetic personas with specific goals, knowledge gaps, and difficulty tiers that have real, multi-turn conversations with the live Glean agent.

  • Bespoke test design: AI analyzes the agent's domain and creates evaluation scenarios tailored to its specific use case
  • Bias, ethics, compliance, and safety: mandatory test categories included in every evaluation, embedded from the start
  • Synthetic personas: AI-generated users who push back, give vague answers, contradict themselves, and test edge cases
  • Perfect repeatability: systematic coverage across every scenario tier, reproducible across versions
Input Agent instructions (from D4), scenario definitions, persona profiles
Output Full conversation transcripts (AI evaluating AI through simulated interaction)
What makes this meta / The human role AI is testing AI using scenarios derived from the same knowledge that built the agent. The system is evaluating its own output. The human defined the evaluation methodology, chose what to measure, designed the difficulty tiers, and can write custom scenarios for edge cases that matter to the business. The rubric criteria are anchored to what IS leadership actually needs in a BRD. The human brought the judgment; AI scales it.
Dimension 6 - AI Judges AI Conversations (against human-defined standards)

A separate AI evaluating whether the evaluation was any good

D6
AI Scores AI
AF Eval Scoring
Actors: Agent Factory Eval scoring engine Method: 16-criteria rubric, 5 scoring dimensions, anchored scoring

Agent Factory Eval reads the Dimension 5 conversation transcripts and scores them against its bespoke rubric across 5 scoring dimensions. It assesses goal completion, process quality, conversational skill, red flag detection, and output quality. Each criterion has anchored descriptions of what a 1, 5, and 10 look like. The scoring AI was not involved in the conversation: a third party reading a transcript of two other AIs talking.

Input Conversation transcripts (AI-vs-AI from D5), rubric definitions
Output Scored evaluations, weakness patterns, suite reports
What makes this meta / The human role Three simultaneous AI roles: the agent being judged, the persona who tested it, and the evaluator reading their conversation. The human defined the methods to define the rubric and the methods to get there. The scoring criteria, the anchored descriptions of what a 1 vs. a 10 looks like, the weighting across dimensions: all human design decisions. AI judges the conversation against standards the human set. The judgment chain: human designed the rubric framework, human+AI created the criteria, AI tested the agent, AI scored the test.
Dimension 7 - Human Approves, AI Repairs

Human reviews findings and decides what to fix; AI executes the repair

D7
AI Repairs AI
Agent Factory Repair
Actors: Agent Factory repair engine Method: finding-to-fix traceability, targeted instruction patches

Agent Factory Repair takes the scored findings from Dimension 6 and makes targeted instruction changes. The repair AI reads AI-generated scores of AI-evaluated conversations with an AI agent, then modifies that agent's AI-generated instructions. Every fix traces to a specific finding. Every finding traces to a scored conversation. Every conversation was AI evaluating AI.

Input Suite reports + scored findings (from D6), current instructions (from D4)
Output Patched instructions, repair changelog, version increment
What makes this meta / The human role The repair loop is the deepest operational recursion: D4 built the instructions, D5 tested them, D6 scored them, D7 repairs them, and the improved instructions go back to D5 for retesting. Four distinct AI operations in a closed loop. The human reviews the findings before repair, deciding which issues are real, which are artifacts of the eval setup, and what priority to assign. The human also decides when the quality bar is met and the loop can stop. AI does the surgery; the human reads the chart and approves the treatment plan.
Dimension 8 - Human Governance, AI Monitors for Drift

After deployment, the human sets governance standards; AI watches for degradation

D8
Ongoing Monitoring and Drift Detection
AF Eval
Actors: Agent Factory Eval + human governance oversight Method: ongoing conversation evaluation, drift analysis, quality benchmarking

Agent Factory Eval doesn't stop at pre-deployment testing. After the agent goes live, it evaluates real conversations to detect prompt drift, quality degradation, and emerging gaps. The monitoring is continuous, but the governance is human-defined:

  • Prompt drift detection: identifying when agent behavior diverges from the approved instruction set over time
  • Quality benchmarking: scoring live conversations against the same bespoke rubric used in pre-deployment eval
  • Improvement opportunities: surfacing patterns in real user interactions that reveal gaps the original PRD didn't anticipate
  • Bias, ethics, compliance, and safety: the same mandatory test categories from D4, now applied to production behavior
Input Live agent conversations, human governance standards, pre-deployment eval baselines
Output Drift reports, quality trend analysis, repair triggers sent back to D6
What makes this Dimension 8 / The human role The monitoring loop extends the recursion past deployment. AI watches AI in production, scoring it against the same human-defined standards from D5-D6. When drift is detected, the human reviews the findings and decides whether to trigger a repair cycle (back to D7) or update the governance standards themselves. The human sets the quality bar, the evaluation criteria, and the threshold for action. AI scales the vigilance; the human owns the judgment.
The Tool Recursion Map

Which tools operate at which dimensions, and how deep their AI dependency chains go

Dim Tool AI Role Input Source Output Consumed By AI Layers Deep
D1 Claude CodeCharlie + AI Tool Builder Human vision + AI dev tools All other dimensions 1
D2 NoneBRD template, email No AI Human experience D2 (problem definition) 0
D2 Claude CodeHuman + AI collaboration Architect Human problem + constraints D3 (AF Discovery), D5 (AF Build), D6 (AF Eval) 2
D3 AF DiscoveryGlean agent, 6 stages Interviewer Human domain knowledge D4 (PRD for Agent Factory) 3
D4 Agent FactoryMulti-phase engine Builder AI output (PRD from D3) D5 (live agent for testing) 4
D5 AF EvalPersona generation + conversation Evaluator AI output (agent from D4) D6 (transcripts for scoring) 5
D6 AF Eval Scoring16-criteria rubric Judge AI output (transcripts from D5) D7 (findings for repair) 6
D7 AF RepairFinding-to-fix pipeline Repairer AI output (scores from D6) D5 (improved agent loops back) 7
D8 AF EvalProduction monitoring Monitor Live conversations, governance standards D7 (repair triggers), human review 8
Read the "AI Layers Deep" column. By Dimension 7, the repair engine is operating 7 layers of AI dependency deep:
  • An AI-built tool (D1)
  • Using an AI-designed methodology (D2)
  • Reading AI-generated scores (D6)
  • Of AI-evaluated conversations (D5)
  • With an AI-built agent (D4)
  • Whose instructions encode AI-extracted requirements (D3)
  • About a human opportunity (D2)

Dimension 8 extends that chain into production monitoring, watching for drift against the same human-defined standards.

The Ouroboros: Where Recursion Compounds

Dimensions 4-7 form a closed loop, the serpent eating its own tail. Each cycle adds one more iteration of AI judging AI repairing AI.

        graph LR
          BUILD["D4: Build\nAI writes instructions"] --> TEST["D5: Evaluate\nAI tests the agent"]
          TEST --> SCORE["D6: Score\nAI judges conversations"]
          SCORE --> REPAIR["D7: Repair\nAI fixes instructions"]
          REPAIR -->|"v2, v3 ... v9"| BUILD

          BUILD -.->|"Cycle 1"| V1["v1: 4.5/10"]
          BUILD -.->|"Cycle 5"| V5["v5: 8.3/10"]
          BUILD -.->|"Cycle 9"| V9["v9: 8.2/10"]

          classDef build fill:#1d4ed822,stroke:#1d4ed8,stroke-width:2px
          classDef test fill:#15803d22,stroke:#15803d,stroke-width:2px
          classDef score fill:#b91c1c22,stroke:#b91c1c,stroke-width:2px
          classDef repair fill:#4338ca22,stroke:#4338ca,stroke-width:2px
          classDef version fill:#7e22ce11,stroke:#7e22ce,stroke-width:1px,stroke-dasharray:5 5

          class BUILD build
          class TEST test
          class SCORE score
          class REPAIR repair
          class V1,V5,V9 version
      
What This Means

What eight dimensions of meta-recursion means for how we think about AI

For Governance

Every dimension is auditable, and every commitment is visible. Each tool produces artifacts: PRDs, instructions, transcripts, scores, changelogs, visual maps. The moment you deploy an agent, you inherit three commitments: operational (drift is now your problem), reputational (the model borrows your name), and regulatory (receipts are the new release notes). The recursion is deep but the trail is clear. Human in the loop is a real job here: named owners, time budgets, stop authority, and protection when doing the right thing is unpopular.

For Strategy

The depth is the advantage. Anyone can deploy a chatbot. Ten dimensions of integrated AI tooling, where each dimension reinforces the others, is a capability that compounds over time. Autonomy scales on conditions, on organizational readiness and governance maturity. Each dimension earns its autonomy by meeting the conditions the previous dimensions established. The philosophical questions (what is the purpose, what counts as knowledge, who is accountable, what are the ethics) are built into the pipeline, forcing decisions before momentum takes over.

For the Team

The human is the architect and governor at every dimension. This pipeline doesn't remove informal human judgment from workflows. It makes that judgment visible, codifies it, and catches where it's been lost. Every workflow carries translation debt: the unpriced interpretive work that keeps handoffs from breaking. AI collects it, amplifies it, and presents it back as review. The human at each dimension is doing the work of noticing what the system can't see: political dynamics, organizational readiness, whether the stated problem is the real problem.

Every dimension is a human-AI partnership.
  • The human is the architect and governor at every level of the stack: defining purpose, reviewing outputs, approving transitions, and setting governance
  • AI scales the execution across eight dimensions, but the design thinking, the judgment calls, and the strategic decisions are human all the way down
  • The philosophical discipline (what state are we trying to change, what counts as evidence, who is accountable, what commitments are we making) is wired into the pipeline itself
  • It forces the hard questions before momentum takes over: philosophy is native to the system's decision-making, applied at every dimension where it matters