Observability in Software: A History

From print debugging to LLM-native tracing. Click any milestone to expand.

General Software

Distributed Systems

LLM Era

Modern Frameworks

1970sprintf debugging: the original observabilityGeneral

The oldest debugging technique: insert print statements, observe output, remove them. No tooling required. Works on any system. Still used daily by every developer regardless of what else is available. The observation model is push (you decide what to print) and synchronous (you read the output as it appears). The problem: you must know in advance what you want to observe.

→ LangChain's set_verbose(True) is a structured version of printf debugging — push model, stdout output, no post-run query API

1990sStructured logging: from strings to eventsGeneral

Log4j (1999) introduced the concept of log levels (DEBUG, INFO, WARN, ERROR) and structured log records. Instead of raw strings, you emit events with metadata. The key insight: if you structure your logs at emission time, you can query them later. Log aggregation tools (Splunk, ELK stack) emerged to query log data at scale. The gap: logs are still text — structured, but not typed or queryable programmatically within the application.

→ LlamaIndex's CBEventType enum is structured logging applied to LLM events — typed events, queryable after the run

2010Dapper (Google): distributed tracing inventedDistributed

Google's Dapper paper introduced distributed tracing: a way to track a single request as it propagates across multiple services. Key concepts: spans (a named, timed operation), traces (a tree of spans for one request), and context propagation (passing trace IDs through service boundaries). Every span has a start time, end time, and metadata. Zipkin (2012) and Jaeger (2017) made this open source. OpenTelemetry (2019) standardised the instrumentation API.

→ SynapseKit's TraceSpan is Dapper's span concept applied to LLM agent steps — named, timed, with structured metadata

2019OpenTelemetry: the unified observability standardDistributed

OpenTelemetry merged OpenCensus and OpenTracing into a single vendor-neutral standard covering traces, metrics, and logs. Instrumentation libraries for every major language. Collector pipeline to route telemetry to any backend. The promise: instrument once, export anywhere. The reality: significant setup overhead, complex configuration, designed for microservices at scale rather than single-process LLM applications.

→ LLM frameworks largely skip OTel instrumentation; the overhead is disproportionate for single-process agents

2023 Q2LangChain set_verbose / set_debug: global flag tracingLLM Era

LangChain introduced set_verbose(True) and set_debug(True) as global flags that enable chain-level logging across all subsequent agent/chain/tool calls. One-line setup. The design is a global side effect — calling set_verbose mutates module-level state and causes all LangChain objects to emit structured output to stderr. Shows thought/action/observation cycles, tool inputs/outputs, and raw prompts. Does not surface step latency. For timing and cost tracking, LangSmith is required.

→ Lowest setup friction of any LLM observability approach. Highest production friction due to global state and no structured query API

2023 Q3LangSmith: LangChain's external observability platformLLM Era

LangSmith launched as LangChain's answer to production observability: set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY, and every run is automatically uploaded with full step-by-step traces, latency per step, token counts, cost estimates, and the ability to replay any run. Free tier: 5,000 traces/month. Production cost scales with volume. Key tradeoff: complete observability, but requires an internet connection and sends LLM prompts to an external service — a non-starter for some production environments.

→ The gap between set_verbose(True) and full observability: LangSmith fills it, but at the cost of an external dependency

2023 Q4LlamaIndex LlamaDebugHandler: callback-based local tracingLLM Era

LlamaIndex shipped LlamaDebugHandler as part of its CallbackManager infrastructure. Every LLM call, tool call, retrieval event, and agent step emits typed CBEventType events. After a run: debug_handler.get_event_pairs(CBEventType.LLM) returns all LLM input/output pairs with timestamps. Fully local — no external service required. Token usage available. Step latency available via event timestamps. The query API is the most expressive of any LLM framework's local observability tooling.

→ Best post-run query API locally. Verbose to set up (4 lines). Global state via Settings — same problem as LangChain's flags at a higher level

2024SynapseKit TracingMiddleware: explicit object tracingModern

SynapseKit shipped TracingMiddleware as an explicit object passed to the agent at construction time (middleware=[tracer]). After the run, tracer.spans returns a list of TraceSpan objects — each with name, duration_ms, and metadata. TokenTracer tracks token usage and estimated cost per call. The key design difference: the Tracer is not global state. Different agents in the same application can have different tracers. You can write test assertions on specific spans. You can pass a mock tracer in tests. 7 lines to set up vs LangChain's 3, but produces a structured, queryable, testable object.

→ Only local tracing design that avoids global state. Most amenable to unit testing and production use without external dependencies

2024–2025OpenTelemetry for LLMs: semantic conventions standardisedModern

The OpenTelemetry community published semantic conventions for LLM observability: standard attribute names for model name, token counts, prompt/completion text, and request duration. Tools like Arize Phoenix, Helicone, and Traceloop built on these conventions to provide vendor-neutral LLM observability. The promise: instrument once, export to any OTel-compatible backend. Still nascent — most frameworks don't instrument against OTel conventions natively, requiring wrapper libraries.

→ The long-term future of LLM observability. Still 1-2 years from being the default choice for new projects

www.engineersofai.com · AI Letters #27 · LLM Showdown Notebook #19