Observability - Rodrigo Ramirez

Observability is a critical foundation, not an afterthought. Without visibility into problems, neither AI nor humans can act. In an AI-coding world where engineers see less of the produced code, observability is the safety net.

Always Managed Services

Use managed observability services. Engineers should focus on the product, not building monitoring infrastructure. Logging, alerting, and tracing are solved problems.

Structured Logs

Include consistent filterable identifiers in every log: providerCode, propertyId, userId, etc.
Use domain language in log identifiers. Not internal IDs — use providerCode, roomTypeCode, ratePlanCode.
Keep the same base data on related logs — consistent identifiers across related entries enable correlation.
Be specific in log messages. “Property has no room_type in DB” is better than “not found.”
Do not duplicate what the framework provides — base location, timestamp, etc. are automatic.

AI and Observability

Log AI agent inputs and outputs. Essential for debugging AI-driven flows.
Monitor production behavior of AI-generated code the same way you monitor any critical system.

Performance Monitoring

Define what “bad” looks like. Move fast until monitoring shows you need to act. You cannot optimize what you cannot see.

P50/P75 real user experience, not synthetic benchmarks.
Monitoring is the key — invest in visibility before investing in optimization.

Error Levels

The error level does not depend only on the error type. A ValidationFailed could be an error if critical. A NotFound could be a warning if expected. Choose levels based on business impact.

Anti-patterns

Observability as an afterthought — adding logging after production issues appear. Build it into the system design from the start.
Unstructured logs — log messages without filterable identifiers are unsearchable at scale.
Internal IDs in logs — use domain language (providerCode) not database IDs. External consumers of logs need business context.
Same error level for everything — ERROR for all failures makes real errors invisible in the noise.