July 2, 2026

How Much Should an Agent Change Without You?

Best for: Product and engineering leads scoping agent work, writing acceptance criteria for unattended runs, and aligning planning tools (Wizard, Cursor) with execution policy.

Use outside Forge: High. The sizing scenarios are stack-agnostic; Forge links provide a worked ladder at the end.

Summary

“Let the agent fix it” is not a requirement. It is a scope statement missing its unit of delivery. Before you launch an unattended loop, answer: what may change, what must stay fixed, and where a human still decides.

Forge encodes those answers in autonomy levels (L0–L8). You do not need all eight rungs on day one—you need the right rung for the task.

Four common scenarios

Scenario A — Fix one function to a known contract

Example: A failing unit test points at one method; signature and architecture are fixed.

Fit: L1 (Function). One patch unit, typically one file. Gate: tests + approve merge.

Mis-size as: L2 or L3 “because AI is smart”—adds proof burden without benefit.

Scenario B — Repair a defect across two or three files

Example: Broken link in README.md and a stale placeholder in a nested doc—no architecture change.

Fit: L2 (Change-set). Multiple patch units; assay must show ≥2 distinct files changed.

Mis-size as: L1 when AC explicitly requires multi-file proof—the run may pass locally but fail assay honesty.

Scenario C — Deliver a user-visible flow in one app

Example: Scanner logic, docs copy, and a UI banner must all work; E2E script proves the flow.

Fit: L3 (Use-case slice). Cross-layer and E2E evidence required. Gate: intent + acceptance in; review out.

Mis-size as: L2 when the task is truly end-to-end—teams ship partial fixes without flow proof.

Scenario D — Ship a feature across modules or repos

Example: New capability spanning services, submodule bumps, stacked PRs, release decision.

Fit: L4+ (vision today). Policy defines ADR + release gates; Platform Campaign and Fleet building blocks are planned, not demonstrated as unattended L4.

Mis-size as: L3 “because it is one product”—multi-repo and release scope exceed use-case slice.

Planning autonomy vs execution autonomy

Forge Lenses Wizard captures planning-time modes (l0_analyst … l3_goal_autopilot). Those names overlap L0–L3 but serve a different layer:

Wizard enum	Planning meaning
`l0_analyst`	Read-only analysis
`l1_drafter`	Drafts for your review
`l2_stage_autopilot`	Multi-step work inside a stage
`l3_goal_autopilot`	Goals across stages with checkpoints

Wizard policy informs prompts. It does not silently grant runtime autonomy. If Wizard says l2_stage_autopilot but your unattended driver has no multi-file assay, you are not honestly running L2 execution.

Full mapping: Platform autonomy hub.

Interactive vs unattended

L0 (Assisted) is continuous human steering—typical Cursor turn-by-turn work with cost-aware planning. That is not a failed L1; it is the correct level when you have not declared an unattended target.

Move up the ladder only when you have gates and evidence to match.

Resource honesty

On a small local model profile, fully cloud-free autonomy above L1 is often unrealistic. L2–L3 frequently need ROI-gated escalation to a larger model or a human at pivots. L4+ require explicit human gates regardless of model size.

Declare the level you can resource and review, not the level that sounds impressive in a slide.

What we do not claim

No one-size-fits-all level — the ladder exists to prevent over-claiming, not to force L3 everywhere.
No unattended L4–L8 in production today — size those efforts with human gates and Campaign/Fleet planning; do not assume PoC enforcement.
No replacement for product judgment — acceptance criteria and intent remain human-owned at L3 and below.

Go deeper

Autonomy Is Not a Switch — why the ladder exists
Autonomy levels (policy)
Platform autonomy hub — per-level architecture pages
Bounded execution examples

Navigate