Autonomy Is Not a Switch
Best for: Engineering managers, platform leads, and CTOs who are asked “how autonomous should our agents be?” and need a sizing model—not a vendor slogan.
Use outside Forge: High. The ladder framing applies to any team running unattended or semi-attended agent loops; Forge links are optional depth at the end.
Summary
Teams still talk about agent delivery as if autonomy were binary: either the human steers every turn, or the agent “owns” the feature. That framing breaks down the moment you try to govern merge, release, architecture, or multi-repo work. A progressive ladder—with a declared unit of delivery, fixed boundaries, and cumulative human gates—is a more honest way to scope what unattended runs may attempt.
The question behind the question
When a lead asks “how autonomous should we be?”, they are usually mixing three different decisions:
- Interactive vs unattended — Is a human in the loop on every model turn, or is the run allowed to continue while they are away?
- Scope of change — One function, a multi-file fix, an end-to-end flow, or a cross-repo feature?
- Evidence bar — What must pass before the change counts as done—tests only, multi-file proof, E2E, ADR, release gate?
A single on/off switch cannot answer all three. That is why Forge documents an L0–L8 execution ladder: not because eight is a magic number, but because meaningful, testable progression needs named rungs with explicit gates.
What each rung actually means
At a high level:
| Band | Plain-language unit | Human gate (short) |
|---|---|---|
| L0 | Suggestions only | Continuous steering |
| L1 | One function or contract-bound change | Approve branch/merge |
| L2 | Multi-file change-set, no rearchitecture | Acceptance criteria + merge |
| L3 | End-to-end use-case slice in one app | Intent in; review out |
| L4+ | Feature, subsystem, product increment, multi-platform | ADR, go/no-go, strategic checkpoints |
Higher levels add gates. They never remove lower-level ones. Claiming L3 behavior without L2-style evidence is how teams end up with “autonomous” demos that cannot survive audit.
Defined, demonstrated, and vision
Not every rung is implemented the same way today. Forge separates three postures explicitly:
- Defined — policy and gates are documented (the full L0–L8 table).
- Demonstrated — green PoC runs exist with machine evidence (today: L1–L3 in the Forge Dark Factory reference loop).
- Vision — policy and planned building blocks only (L4–L8); no claim of production-ready unattended delivery at those levels yet.
That honesty matters for executives evaluating vendor claims. “We support autonomous development” is meaningless without naming which rung, what stays fixed, and which gate still requires a person.
How this complements “AI-first, human-gated”
Forge’s AI-first, human-gated principle says agents should do footwork while humans keep decision rights. The autonomy ladder is the sizing companion: it tells you how much footwork an unattended run may attempt before it must stop for a human.
Interactive Cursor planning is not the same as an L2 change-set campaign. Wizard planning modes in Forge Lenses are not the same as runtime execution levels unless you wire them that way. Conflating those namespaces is a common source of over-claiming.
What we do not claim
- No compliance-ready autonomy — the ladder is engineering governance, not certification.
- No unsupervised push/deploy — merge and release remain human-gated unless your org automates them under separate policy.
- No “fully autonomous” L4–L8 today — those levels are vision with explicit ADR, go/no-go, and strategic checkpoints.
- Escalation is expected — especially for architecture, security, and ambiguity; a low escalation rate is a goal, not a day-one guarantee.
Go deeper
- Autonomy levels (Blueprints policy) — canonical L0–L8 table and assay rules
- Platform autonomy hub — implementation readiness matrix and per-level building-block architecture
- Bounded execution examples — real L1–L3 runs with evidence
- AI-first, human-gated — principle this post sizes for execution
Related: The New Bottleneck Is Verification, Not Coding · Governance Is Becoming a Performance Function