AI Coding Agents - Rodrigo Ramirez

The Core Workflow

Human decides → AI executes → human reviews. Skipping the first step produces well-written code that is conceptually wrong. Skipping the last step ships code nobody verified.

AI writes code. Humans own consequences. Bugs, security, regressions do not transfer to the AI.
The goal is not speed. The goal is trustable output.

Always Start with Plan Mode

Plan mode is the baseline for every change — even bug fixes. Letting the AI agent plan before acting surfaces related files, edge cases, and patterns you might miss. Plan mode is cheap and fast. Skipping it is not worth the risk.

Bug fix → plan mode → fix.
Small feature → plan mode → build.
Large feature → write a spec first → plan mode for each task → build.

The variable is how much spec work comes before the plan. Plan mode itself is always the starting point.

Rules Files Are the Key

Project instructions (CLAUDE.md, rules files) define how code should be structured, named, and organized. Well-defined rules produce consistent output across sessions and models.

Without rules, each generation drifts. Variable names change. Error handling varies. The codebase becomes harder to navigate.
Dedicate most time to defining correct abstraction rules. This is the highest-leverage work when using AI coding agents.

What AI Handles Well (Accidental Complexity)

Boilerplate and scaffolding
Repetitive patterns that follow established conventions
Code generation within clear rules
Test generation for well-defined contracts

What AI Struggles With (Essential Complexity)

Business logic boundaries
Architecture tradeoffs
Decisions requiring understanding of the broader system
Domain modeling where the “right” answer depends on context

Decide the essential part first. Then let AI execute.

Autonomy Levels

Match autonomy to risk and clarity:

High autonomy: simple, well-defined tasks with clear conventions and good test coverage.
Tight oversight: complex tasks with ambiguous requirements, cross-module changes, or new patterns.
As conventions mature and coverage grows, safe autonomy increases.

The Harness: Guides + Sensors

Agent = Model + Harness. The model is the AI. The harness is the environment — rules files (guides) and automated checks (sensors). When output is wrong, fix the harness, not the prompt.

Guides (feedforward): rules files, CLAUDE.md, conventions. Steer the agent before it acts.
Sensors (feedback): linters, type checkers, tests wired into the agent loop. Check the agent after it acts.

Wire sensors into the agent’s working loop: lint after every edit, type check incrementally, run related tests before finishing. The agent self-corrects before a human sees the output.

When sensors can’t fix a problem after 3 attempts, circuit breakers escalate to a human instead of looping forever. Every escalation reveals a harness gap to fix.

See harness-engineering for the full framework.

Observability Is the Safety Net

In a world where engineers see less of the produced code, observability is critical. Without visibility into problems, neither AI nor humans can act.

Structured logs with consistent identifiers for filtering.
Log AI agent inputs and outputs for debugging.
Monitor production behavior of AI-generated code like any critical system.

The Day-Night Shift

Severity-gated review enables unsupervised parallel execution:

Day — prepare specs, run severity-gated review with adversarial interrogation (Red Team + Pre-mortem), iterate until verdict is GO
Night — agents execute multiple features in parallel from reviewed, trusted specs
Morning — evaluate results with implementation review (code follows contracts?)

This only works when planning output is trusted enough for unsupervised execution. Gate the plan, not just the code. Wrong decisions in specs become wrong code by morning. See severity-gated-review for the full framework.

Decision Criteria

When deciding agent autonomy level: evaluate the task’s convention coverage, test coverage, and blast radius. Well-covered, low-risk tasks can run with high autonomy. New patterns, cross-module changes, or ambiguous requirements need tight oversight. When in doubt, start with oversight and relax as confidence grows.

Anti-patterns

Letting agents run without rules files — produces inconsistent code that compounds into maintenance burden.
Trusting agent output without review — AI does not understand consequences. Humans must verify.
Over-specifying in prompts instead of rules files — prompt context is session-specific. Rules files persist.
Using AI for the thinking, not just the coding — AI is a thinking partner, but the judgment and decision must stay human.
Full automation from spec alone — generating a detailed spec and expecting AI to implement a large feature end-to-end without iteration is waterfall with AI. Build, discover, revise.
Skipping plan mode — jumping straight to implementation misses context the agent would surface during planning. Always plan first.
No sensors in the agent loop — guides without sensors means the agent can’t self-correct. Wire linters and type checkers into the feedback cycle.
No circuit breakers — agents retrying the same error endlessly waste tokens and time. After 3 identical failures, escalate.