Fitness Functions - Rodrigo Ramirez

A fitness function is any automated check that protects an architectural property. Unit tests verify what code does. Fitness functions verify how code is structured — module boundaries, layer direction, coupling, contracts.

The Missing Layer

The testing pyramid has a gap. Linters catch style. Unit tests catch logic. Integration tests catch contracts. Nothing catches structural decay — boundary erosion, layer violations, coupling growth.

Fitness functions fill that gap. They sit at the top of the escalation ladder (convention → documentation → linter → fitness function) and cannot be bypassed.

Architectural Dimensions

Architecture spans multiple dimensions. Each can degrade independently. Select fitness functions based on which dimensions are critical to the project.

Dimension	What degrades without it	What to check
Structure	Boundaries blur, coupling grows	Import direction, circular dependencies, coupling thresholds
Contracts	API changes break clients	Breaking change detection, schema-to-code drift
Data	Migrations destroy data	Destructive operation blocking, entity-schema consistency
Maintainability	Files grow, complexity hides	Complexity limits, file size caps, dead export detection
Performance	Bundles bloat, queries slow	Bundle size budgets, query cost limits, response time thresholds
Security	Secrets leak, endpoints unprotected	Secret scanning, CVE detection, auth enforcement
Reliability	Errors cascade, timeouts missing	Error boundary requirements, timeout enforcement
Observability	Incidents take hours to diagnose	Structured log format, required trace spans

Most teams only cover maintainability (via linters). Everything else is trust and manual review.

The Ratchet Pattern

Architecture quality should only go up:

Clean up a module
Add it to an enforcement list (violations become errors)
New modules are enforced from creation
The list grows. Quality never decreases.

Never write fitness functions for the codebase you want — write them for the one you have. Clean up first, then enforce. The function is a lock, not a goal.

For migrations: migrated modules go on an allowlist (errors). Unmigrated modules get warnings. As migration progresses, the allowlist grows.

Scope

Atomic — single check, single property. Fast. “This file has a lint error.”
Holistic — combines signals across the system. “All boundaries respected AND tests pass AND contracts valid.”

Both are needed. Atomic catches individual violations. Holistic catches emergent issues.

Timing

Timing	When	What it catches
Triggered	On file edit, commit, or PR	Individual violations as they happen
Continuous	Daily or weekly schedule	Gradual decay — dead code, dependency drift, convention erosion
Temporal	Pre-release or quarterly	Accumulated risk — security audits, architecture reviews

All three are needed. Triggered catches mistakes. Continuous catches drift. Temporal catches risk.

Why AI Makes This Critical

AI amplifies whatever patterns exist. One boundary violation becomes the pattern the agent copies for every new file. Without fitness functions, a single shortcut propagates at machine speed.

AI can’t review its own architecture. Code passes linters and tests but structural properties go unchecked unless fitness functions check them.

The cost of not enforcing becomes exponential. Multiple AI agents generating code in parallel — every unenforced rule produces violations in every session.

Connection to Severity-Gated Review

When a fitness function fails, it becomes a finding in the severity-gated review. The fitness function detects the violation. The severity framework classifies it (CRITICAL / MAJOR / MINOR) and gates progress (NO-GO / CONDITIONAL / GO). Fitness functions are the automated sensors; severity gates are the judgment layer. See severity-gated-review.

Decision Criteria

When choosing which dimensions to protect: ask “if this degrades silently, does the system fail in a way that matters?” If yes, add a fitness function.

When deciding scope: use atomic + triggered for fast per-edit feedback. Use holistic + triggered for CI validation. Use atomic + continuous for drift detection. Use manual + temporal for what can’t be automated.

When timing enforcement: don’t build fitness functions for problems you don’t have yet. Identify the current constraint. Build for it. Evolve when the constraint changes.

Anti-patterns

Aspirational fitness functions — encoding what you wish the codebase looked like. Breaks everything on day one.
Only maintainability coverage — linters check style but not architecture. Most structural decay is invisible to linters.
No continuous checks — triggered checks catch per-edit violations but miss gradual drift.
All-or-nothing enforcement — either enforce everything (breaks existing code) or nothing (allows regression). The ratchet pattern is the middle path.
Fitness functions without cleanup — enforcement without migration work creates noise, not quality.