The Architecture Nobody Is Selling

The Best Agent
Is Not An Agent
At All.

Everyone is building multi-agent orchestration systems. One company quietly realized the answer was already there — in a folder on your computer.

The Context Wall

To make AI useful, you need to route it to the right place — with the right instructions, tools, and data. This is the context wall. Everyone is solving it the hard way.

❌ The Wrong Way
Agent Frameworks
Teams spend months writing Python orchestration code — custom harnesses, routing graphs, vector databases — only to watch a model update break half of it.
LangChainLangGraphSemantic KernelAutoGenCrewAI
✓ The Right Way
The File Tree
A folder with a markdown file and a script. One coding agent reads it, gets full context, executes tasks, spawns sub-agents, and manages memory — zero orchestration code.
SKILL.mdscript.pydata/Claude Code
~/company_ai $ — bash
$

Every Agent Needs
Only Three Things

Strip away every framework. Any agent ever built is just a system for routing a model to the right combination of these three things — every time.

1
📋
Instructions
What the agent should do, how to behave, what rules it follows.
→ SKILL.md
2
🔧
Tools
What the agent can actually do — APIs, code execution, database queries.
→ tools.py / MCP servers
3
🗄️
Data
What the agent knows — company docs, history, config, reference knowledge.
→ data/ folder

"You can map every single AI agent out there to a simple file tree. Not approximately. Exactly."

— Core Thesis

Click Through the Architecture

A real company's AI setup — built entirely from folders and files. Click any node to understand what lives there and why.

COMPANY_AI
📂
Click a file or folder to explore

Old Way vs Right Way

The same outcome — a multi-workflow AI system — built two different ways.

⚠️Agent Framework Approach
Three separate agent deployments

One per workflow. Each has its own infrastructure, dependencies, failure modes.

Python orchestration code

Thousands of lines that break on model updates.

Brittle to model updates

Every new release potentially breaks the orchestration layer.

Scaling = more infrastructure

10 workflows means 10 agent deployments.

File Tree Approach
One agent, unlimited workflows

Claude Code reads whatever skill folder it's pointed at. Zero extra infrastructure.

Zero orchestration code

Markdown and Python. Anyone can read, edit, and debug without a framework.

Model updates make it better

New capabilities condense to simpler instructions. Files never break.

Scaling = adding folders

Thousands of parallel instances from one foundation.

Build at the Right Layer

Most developers build at the bottom two layers — where the ground shifts fastest.

⚠ Model Layer
Raw intelligence and reasoning. Changes constantly.
AI Companies — don't touch
⚡ Agent Runtime
VS Code / Cursor + Claude Code. Already built.
You — barely touch
📄 Skill / Task
SKILL.md + scripts. Atomic unit of work.
You — build simply
🗂 Workflow
Folder of skill folders. One per department or role.
You — organize freely
🤖 Sub-Agent
Nested folders, spawned dynamically by the agent itself.
Agent — auto-generates
The insight: You're never racing against AI progress because you don't live at the layer that changes.

Model Updates Help You

Instead of breaking your system, every improvement collapses complexity upward.

Scenario 01
Model gains new capability
❌ Old: Orchestration layer needs refactoring.
✓ New: Update two lines in SKILL.md.
Scenario 02
Competitor ships your feature
❌ Old: Your agent framework is commoditized.
✓ New: That workflow becomes a subfolder with one tool call.
Scenario 03
You switch AI providers
❌ Old: Entire SDK surface changes. Months of migration.
✓ New: Point a different agent at the same folders.

The most powerful AI deployment
tool is already on your computer

Nobody is selling it. It's called a folder.

Free Already Built Infinitely Scalable Model-Agnostic Never Obsolete
Read the Academic Paper Behind This →
Academic Research · arXiv 2603.16021 · MIT License
Interpretable Context
Methodology
Jake Van Clief & David McDermott — Eduba / University of Edinburgh · March 2026
Abstract: Current approaches to AI agent orchestration typically involve building multi-agent frameworks that manage context passing, memory, error handling, and step coordination through code. These frameworks work well for complex, concurrent systems. But for sequential workflows where a human reviews output at each step, they introduce engineering overhead the problem does not require. This paper presents a method that replaces framework-level orchestration with filesystem structure — numbered folders as stages, plain markdown files as context, local scripts for mechanical tasks. Open source under the MIT license.
Section 01
The Problem Being Solved
Frameworks like LangChain and AutoGen are genuinely good tools — for concurrent, dynamic systems. But for sequential, human-reviewed workflows, they introduce enormous accidental complexity.
ProblemWhat It Costs
Changing step orderRequires editing orchestration code and redeploying
Modifying a promptRequires finding it buried in agent configuration
Inspecting intermediate stateRequires adding logging, dashboards, or tracing infrastructure
Handing off to a colleagueRequires documenting environment, dependencies, and setup
Non-developer making changesOften impossible without developer involvement

"For sequential workflows, you're using a hammer designed for a different nail."

— ICM Paper, Section 1
Section 02
The Central Insight
The paper's core observation is elegant and counterintuitive. You don't need a coordination framework — you need a folder structure. The "coordination logic" that LangChain puts in Python objects and message arrays, ICM puts in file names, folder hierarchy, and markdown contracts.
Framework Approach
Coordination in Code
Agents as objects. State as in-memory variables. Coordination as function calls. Opaque by default, requires developers to modify. Breaks when models update.
ICM Approach
Coordination in Folders
The filesystem IS the orchestrator. Coordination logic lives in file names, folder hierarchy, and markdown contracts. Plain text is the universal interface. Inspectable by anyone.

"This is philosophically aligned with Unix's 1970s insight: the power comes not from any individual program, but from how they're connected — and plain text files are the universal connective tissue."

— ICM Paper, Section 2
Section 03
Intellectual Lineage
The paper is unusually well-grounded in CS history. Each theoretical reference maps directly to a concrete ICM design decision — this is not a hack, it's a synthesis.
1978
Unix Pipeline Philosophy
McIlroy
"Do one thing well." "Output of one is input of another." "Plain text as universal interface." The foundational architecture of composable systems.
→ Each ICM stage does one thing. Output folders feed next stages. Everything is markdown.
1979
Make / Build Systems
Feldman
Files are both the artifacts of work AND the coordination mechanism. No separate orchestration layer needed when the filesystem tracks what's been produced.
→ ICM stages have explicit Input tables in their contracts, just like Make dependency declarations.
1986
Multi-Pass Compilers
Aho, Lam, Sethi, Ullman
A compiler does multiple passes. Tokenize → Parse → Analyze → Optimize → Generate. Each pass reads the prior output and transforms it into an intermediate representation.
→ ICM does the same with content. Research → Script → Animation. Each stage is a pass. This unlocks incremental recompilation theory.
1972
Information Hiding
Parnas / Dijkstra
Systems should be decomposed so each module hides its internal decisions. "Address one thing at a time."
→ Each ICM stage hides its processing from the next. Stage 2 only sees Stage 1's output file, not how it was produced.
1984
Literate Programming
Knuth
Programs should be written primarily for humans to read. The instruction and the documentation should be the same artifact — not separate documents.
→ ICM's CONTEXT.md files are simultaneously agent instructions AND documentation. Reading them tells you exactly what the pipeline does.
1991
"Worse is Better"
Gabriel
Systems prioritizing simplicity of implementation tend to survive and spread. Easier to port, understand, and improve than feature-complete but complex alternatives.
→ ICM trades framework flexibility for portability. A folder of markdown files can be zipped, emailed, or handed to a non-developer.
Section 04
The Architecture In Depth
The heart of ICM is a five-layer context hierarchy. Every agent at every stage loads context from exactly these layers — structured so the model receives already-organized context, not a mixed dump.
0
CLAUDE.md
Structural / routing — "Where am I?" Identity of the workspace. Top-level orientation for the agent.
~800 tokens
1
CONTEXT.md
Structural / routing — "Where do I go?" Workspace structure, stage sequence, and navigation map.
~300 tokens
2
Stage CONTEXT.md
Structural / routing — "What do I do?" The current stage's specific task, process, and output requirements.
200–500 tokens
3
Reference Material
Content — "What rules apply?" voice.md, design-system.md, conventions.md. Static across runs. Model internalizes as constraints.
500–2k tokens
4
Working Artifacts
Content — "What am I working with?" research-output.md, script-draft.md. Changes every run. Model transforms as input.
Varies
Layer 3 vs Layer 4: The Critical Distinction

When you mix rules with per-run content in an undifferentiated context window, the model has to sort them itself. ICM separates them structurally before the model ever sees them.

Layer 3 — The Factory (Reference)
Changes per run?No
Examplesvoice.md, design-system.md
Model shouldInternalize as constraints
AnalogyThe recipe
Layer 4 — The Product (Working)
Changes per run?Yes
Examplesresearch-output.md, script-draft.md
Model shouldTransform as input
AnalogyThe ingredients

ICM keeps each stage at 2,000–8,000 focused tokens. The monolithic alternative reaches 30,000–50,000 tokens, most of it irrelevant to the current stage. This is prevention, not compression — the irrelevant tokens are never loaded in the first place.

— Citing Liu et al., "Lost in the Middle," 2024
Section 05
What Practitioners Actually Do
Empirical observation from 33 practitioners reveals a U-shaped intervention pattern. High editing at the start and end, lower in the middle. This isn't complacency — it's appropriate calibration.
Human Edit Rate by Pipeline Stage
Stage 1
Research
~92%
Stage 2
Script
~30%
Stage 3
Middle
~30%
Final
Production
~78%
Stage 1: Directional editing — narrowing from broad possibilities. Creative human judgment.
Middle: Well-constrained stages — clear inputs + strong reference = narrow error space.
Final: Alignment editing — checking output against earlier decisions. Closer to debugging.
Section 06
Where ICM Works and Doesn't
The paper is admirably honest. ICM isn't replacing frameworks across the board — it targets a large, common, underserved class of workflows that existing tools over-engineer.
✓ ICM Works For
Sequential workflows — step 2 genuinely follows step 1
Human-reviewed workflows — a person checks each step
Repeatable workflows — same pipeline, different inputs each run (weekly reports, video production, course development)
Non-developer operators — the markdown interface means anyone can modify stage behavior
✗ ICM Does NOT Work For
Real-time multi-agent collaboration — agents needing tight communication loops (AutoGen is right here)
High-concurrency systems — many users hitting the same pipeline simultaneously
Complex automated branching — mid-pipeline automated decisions require scripting that turns ICM into a framework anyway
Dynamic, unpredictable workflows — if you don't know the stages in advance, you can't define the folders in advance
Section 07
Observability as Side Effect
One of the most important points in the paper: observability isn't a feature you add — it's a structural consequence. You cannot make the system opaque. There's nothing to hide.
A
🔍
Inherently Inspectable
Every intermediate output is a plain file in a predictable folder. Open it in any text editor. No dashboards, no logging infrastructure, no tracing setup required.
B
⚖️
Regulatory Alignment
The EU AI Act requires human oversight, staged review points, and audit trails for high-risk AI systems. ICM produces all three as a byproduct of architecture — for free.
C
🧠
Rudin's Principle
Stop building opaque systems and trying to explain them after the fact. Build systems that are inherently interpretable. ICM is a direct implementation of this principle.
Section 08
Future Directions
Section 6 of the paper extends the multi-pass compiler analogy into specific proposed tooling — applying decades of compiler theory to AI workflow debugging.
🔗
Output Provenance Identifiers
Embed markers in stage outputs that link back to the source instruction — like debug symbols in compiled binaries. Trace a wrong phrase in Stage 3 all the way back to the specific line in voice.md that caused it.
✔️
Cross-Stage Trace Verification
A Verify section in stage contracts that checks current output against earlier stage outputs. Already prototyped as an "audit file" catching timing and alignment errors between Stage 2 and Stage 3.
⏸️
Breakpoints in Markdown
Pause execution mid-stage to verify the agent interpreted a constraint correctly before continuing. Semantic debugging — inspect the agent's current interpretation, not just its final output.
The Edit-Source Principle
Editing output is patching the binary

Two kinds of edits exist: creative edits (genuine human value — correct to edit output) and diagnostic edits (you tighten the same paragraph every run — this is a bug in the source contract). The proposed direction: track output edits across runs, surface recurring patterns, and suggest source-level changes. Workspaces that improve with use.

Section 09
Key Nuances & Tensions
The paper is unusually self-aware. It names five tensions any serious practitioner should understand before adopting ICM.
Tension 01
Simplicity vs. Capability
ICM explicitly trades capability for simplicity — no concurrent execution, no complex branching, no real-time coordination. The tradeoff is only acceptable if your workflows fall within ICM's scope.
Tension 02
Self-Reported Empirical Data
The U-shaped intervention pattern and the 30-of-33 practitioner claims come from self-reported conversations in an invite-only community. The paper acknowledges this. These are hypotheses worth testing, not established facts.
Tension 03
Model Agnosticism vs. Specific Implementation
ICM claims model-agnosticism, but all testing was done on Claude Opus/Sonnet. The 5-layer hierarchy was likely tuned to how Claude handles context. Whether it generalizes to GPT-4o, Gemini, or Llama is an open empirical question.
Tension 04
Output Editing vs. Source Improvement
The review gate design encourages editing output. But editing output without updating the source means the pipeline doesn't improve over runs. The tooling to close this loop doesn't exist yet.
Tension 05
Growing Context Windows
As models handle 200K+ tokens without degradation, the engineering argument for scoped context loading weakens. But the human-interaction arguments remain: even if the model handles 50K tokens equally well, the practitioner still cannot review a 50K-token context to catch errors. Observability and editability are human concerns, not just model concerns.
Section 10
The Complete Picture
The philosophical claim: the filesystem is a coordination primitive that has been systematically underused in AI system design. What if files are the universal interface?
PROBLEM: Sequential AI workflows → frameworks add unnecessary complexity INSIGHT: Filesystem IS the orchestrator (Unix, 1970s — still works) SOLUTION: ICM ├── Numbered folders = stage sequence ├── CONTEXT.md files = stage contracts ├── Layer 3/4 split = rules vs. inputs ├── Review gates = human control points └── Output folders = handoff points THEORETICAL GROUNDING: Unix pipelines → composability Make → files as coordination Compilers → multi-pass transforms Literate programming → self-documenting Context engineering → focused windows RESULT: ✓ No framework code ✓ No server infrastructure ✓ Editable by non-developers ✓ Observable by default ✓ Version-controllable ✓ Portable as a zip file ✗ No concurrency ✗ No complex branching ✗ Not for dynamic workflows FUTURE: → Semantic debugging (trace output to source) → Edit-source principle (improve pipeline over time) → Cross-stage verification
📄 arXiv Abstract ⬇ Download PDF 🌐 HTML Version