Fitness Functions
A fitness function is any automated check that protects an architectural property. Unit tests verify what code does. Fitness functions verify how code is structured — module boundaries, layer direction, coupling, contracts.
The Missing Layer
The testing pyramid has a gap. Linters catch style. Unit tests catch logic. Integration tests catch contracts. Nothing catches structural decay — boundary erosion, layer violations, coupling growth.
Fitness functions fill that gap. They sit at the top of the escalation ladder (convention → documentation → linter → fitness function) and cannot be bypassed.
Architectural Dimensions
Architecture spans multiple dimensions. Each can degrade independently. Select fitness functions based on which dimensions are critical to the project.
| Dimension | What degrades without it | What to check |
|---|---|---|
| Structure | Boundaries blur, coupling grows | Import direction, circular dependencies, coupling thresholds |
| Contracts | API changes break clients | Breaking change detection, schema-to-code drift |
| Data | Migrations destroy data | Destructive operation blocking, entity-schema consistency |
| Maintainability | Files grow, complexity hides | Complexity limits, file size caps, dead export detection |
| Performance | Bundles bloat, queries slow | Bundle size budgets, query cost limits, response time thresholds |
| Security | Secrets leak, endpoints unprotected | Secret scanning, CVE detection, auth enforcement |
| Reliability | Errors cascade, timeouts missing | Error boundary requirements, timeout enforcement |
| Observability | Incidents take hours to diagnose | Structured log format, required trace spans |
Most teams only cover maintainability (via linters). Everything else is trust and manual review.
The Ratchet Pattern
Architecture quality should only go up:
- Clean up a module
- Add it to an enforcement list (violations become errors)
- New modules are enforced from creation
- The list grows. Quality never decreases.
Never write fitness functions for the codebase you want — write them for the one you have. Clean up first, then enforce. The function is a lock, not a goal.
For migrations: migrated modules go on an allowlist (errors). Unmigrated modules get warnings. As migration progresses, the allowlist grows.
Scope
- Atomic — single check, single property. Fast. “This file has a lint error.”
- Holistic — combines signals across the system. “All boundaries respected AND tests pass AND contracts valid.”
Both are needed. Atomic catches individual violations. Holistic catches emergent issues.
Timing
| Timing | When | What it catches |
|---|---|---|
| Triggered | On file edit, commit, or PR | Individual violations as they happen |
| Continuous | Daily or weekly schedule | Gradual decay — dead code, dependency drift, convention erosion |
| Temporal | Pre-release or quarterly | Accumulated risk — security audits, architecture reviews |
All three are needed. Triggered catches mistakes. Continuous catches drift. Temporal catches risk.
Why AI Makes This Critical
AI amplifies whatever patterns exist. One boundary violation becomes the pattern the agent copies for every new file. Without fitness functions, a single shortcut propagates at machine speed.
AI can’t review its own architecture. Code passes linters and tests but structural properties go unchecked unless fitness functions check them.
The cost of not enforcing becomes exponential. Multiple AI agents generating code in parallel — every unenforced rule produces violations in every session.
Connection to Severity-Gated Review
When a fitness function fails, it becomes a finding in the severity-gated review. The fitness function detects the violation. The severity framework classifies it (CRITICAL / MAJOR / MINOR) and gates progress (NO-GO / CONDITIONAL / GO). Fitness functions are the automated sensors; severity gates are the judgment layer. See severity-gated-review.
Decision Criteria
When choosing which dimensions to protect: ask “if this degrades silently, does the system fail in a way that matters?” If yes, add a fitness function.
When deciding scope: use atomic + triggered for fast per-edit feedback. Use holistic + triggered for CI validation. Use atomic + continuous for drift detection. Use manual + temporal for what can’t be automated.
When timing enforcement: don’t build fitness functions for problems you don’t have yet. Identify the current constraint. Build for it. Evolve when the constraint changes.
Anti-patterns
- Aspirational fitness functions — encoding what you wish the codebase looked like. Breaks everything on day one.
- Only maintainability coverage — linters check style but not architecture. Most structural decay is invisible to linters.
- No continuous checks — triggered checks catch per-edit violations but miss gradual drift.
- All-or-nothing enforcement — either enforce everything (breaks existing code) or nothing (allows regression). The ratchet pattern is the middle path.
- Fitness functions without cleanup — enforcement without migration work creates noise, not quality.