July 2, 2026

Autonomy Is Not a Switch

Best for: Engineering managers, platform leads, and CTOs who are asked “how autonomous should our agents be?” and need a sizing model—not a vendor slogan.

Use outside Forge: High. The ladder framing applies to any team running unattended or semi-attended agent loops; Forge links are optional depth at the end.

Summary

Teams still talk about agent delivery as if autonomy were binary: either the human steers every turn, or the agent “owns” the feature. That framing breaks down the moment you try to govern merge, release, architecture, or multi-repo work. A progressive ladder—with a declared unit of delivery, fixed boundaries, and cumulative human gates—is a more honest way to scope what unattended runs may attempt.

The question behind the question

When a lead asks “how autonomous should we be?”, they are usually mixing three different decisions:

Interactive vs unattended — Is a human in the loop on every model turn, or is the run allowed to continue while they are away?
Scope of change — One function, a multi-file fix, an end-to-end flow, or a cross-repo feature?
Evidence bar — What must pass before the change counts as done—tests only, multi-file proof, E2E, ADR, release gate?

A single on/off switch cannot answer all three. That is why Forge documents an L0–L8 execution ladder: not because eight is a magic number, but because meaningful, testable progression needs named rungs with explicit gates.

What each rung actually means

At a high level:

Band	Plain-language unit	Human gate (short)
L0	Suggestions only	Continuous steering
L1	One function or contract-bound change	Approve branch/merge
L2	Multi-file change-set, no rearchitecture	Acceptance criteria + merge
L3	End-to-end use-case slice in one app	Intent in; review out
L4+	Feature, subsystem, product increment, multi-platform	ADR, go/no-go, strategic checkpoints

Higher levels add gates. They never remove lower-level ones. Claiming L3 behavior without L2-style evidence is how teams end up with “autonomous” demos that cannot survive audit.

Defined, demonstrated, and vision

Not every rung is implemented the same way today. Forge separates three postures explicitly:

Defined — policy and gates are documented (the full L0–L8 table).
Demonstrated — green PoC runs exist with machine evidence (today: L1–L3 in the Forge Dark Factory reference loop).
Vision — policy and planned building blocks only (L4–L8); no claim of production-ready unattended delivery at those levels yet.

That honesty matters for executives evaluating vendor claims. “We support autonomous development” is meaningless without naming which rung, what stays fixed, and which gate still requires a person.

How this complements “AI-first, human-gated”

Forge’s AI-first, human-gated principle says agents should do footwork while humans keep decision rights. The autonomy ladder is the sizing companion: it tells you how much footwork an unattended run may attempt before it must stop for a human.

Interactive Cursor planning is not the same as an L2 change-set campaign. Wizard planning modes in Forge Lenses are not the same as runtime execution levels unless you wire them that way. Conflating those namespaces is a common source of over-claiming.

What we do not claim

No compliance-ready autonomy — the ladder is engineering governance, not certification.
No unsupervised push/deploy — merge and release remain human-gated unless your org automates them under separate policy.
No “fully autonomous” L4–L8 today — those levels are vision with explicit ADR, go/no-go, and strategic checkpoints.
Escalation is expected — especially for architecture, security, and ambiguity; a low escalation rate is a goal, not a day-one guarantee.

Go deeper

Autonomy levels (Blueprints policy) — canonical L0–L8 table and assay rules
Platform autonomy hub — implementation readiness matrix and per-level building-block architecture
Bounded execution examples — real L1–L3 runs with evidence
AI-first, human-gated — principle this post sizes for execution

Navigate