Tracing Design Explorer

Three different answers to the same question: how do you see inside a running agent?

SynapseKit — Tracer Object
LangChain — Global Flags
LlamaIndex — Callback Manager
SynapseKit
TracingMiddleware — explicit object, programmatic access
7/7 features locally
from synapsekit import ReActAgent, CalculatorTool from synapsekit.observability import TracingMiddleware tracer = TracingMiddleware(verbose=True) agent = ReActAgent(llm=llm, tools=[CalculatorTool()], middleware=[tracer], max_iterations=5) result = await agent.run(query) tracer.print_summary()
tracer.spans # → [ # TraceSpan(name=AGENT_START, duration_ms=0.1, metadata={query: ...}), # TraceSpan(name=LLM_CALL, duration_ms=320, metadata={tokens: 84}), # TraceSpan(name=TOOL_CALL, duration_ms=2.3, metadata={tool: calculator, result: 1024}), # TraceSpan(name=LLM_CALL, duration_ms=280, metadata={tokens: 41}), # TraceSpan(name=AGENT_FINISH, duration_ms=0.1, metadata={total_tokens: 125}), # ] tracer.token_usage # → {prompt: 125, completion: 84, total: 209} tracer.total_latency_ms # → 602.6
Token usage
tracer.token_usage — structured dict, queryable after run
Step latency
TraceSpan.duration_ms on every event — no external service needed
Agent steps
One TraceSpan per ReAct iteration, fully structured
Tool call args + returns
TOOL_CALL spans include full args dict and return value
Full raw LLM prompt
TracingMiddleware(level="debug") logs complete prompt text
Retrieved documents
RETRIEVAL spans include doc content and relevance scores
Zero-config enable
Tracer() + pass tracer= kwarg — no API keys or signups
Key advantage
The only design that avoids global state. Different agents in the same application can have different tracers. Unit testable — assert on specific spans directly in test code.
Tradeoff
Most lines to enable (7). The middleware=[tracer] wiring is explicit boilerplate you repeat for every agent.
LangChain
Global flags — minimum setup, side-effectful output
5/7 features locally
from langchain_core.globals import set_verbose, set_debug set_verbose(True) # shows chain I/O, tool calls, agent steps set_debug(True) # additionally shows full raw prompts sent to LLM # All subsequent chain/agent/tool calls now emit structured logs.
# No query object — output goes to stderr # To capture programmatically: import io, contextlib buf = io.StringIO() with contextlib.redirect_stderr(buf): result = chain.invoke(query) log_output = buf.getvalue() # raw string — not structured # For structured, queryable, per-run data → LangSmith required: # LANGCHAIN_TRACING_V2=true # LANGCHAIN_API_KEY=ls__... # → full traces at smith.langchain.com
⚠️
Token usage
Shown in verbose output as text — not in a queryable object. Full cost tracking needs LangSmith.
Step latency
Not surfaced locally at all. Requires LangSmith for timing data.
Agent steps
set_verbose shows Thought/Action/Observation cycles in output
Tool call args + returns
Verbose output includes tool input and output text
Full raw LLM prompt
set_debug(True) prints complete prompt/completion to stderr
Retrieved documents
Verbose chain output includes retrieved document content
Zero-config enable
set_verbose(True) — one line, no API keys, immediate output
Key advantage
Fewest lines of any framework (3). Works immediately on any existing LangChain agent. One-line debug mode is genuinely useful during development.
Key gap
No step latency locally. No structured query object — output is text to stderr. Global state means all LangChain objects in the process are affected. Full production observability requires LangSmith (external service).
LlamaIndex
CallbackManager — typed events, best post-run query API
6/7 features locally
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler from llama_index.core import Settings debug_handler = LlamaDebugHandler(print_trace_on_end=True) Settings.callback_manager = CallbackManager([debug_handler])
from llama_index.core.callbacks import CBEventType # Query specific event types llm_pairs = debug_handler.get_event_pairs(CBEventType.LLM) tool_pairs = debug_handler.get_event_pairs(CBEventType.FUNCTION_CALL) retrieval = debug_handler.get_event_pairs(CBEventType.RETRIEVE) agent_steps = debug_handler.get_event_pairs(CBEventType.AGENT_STEP) # LLM inputs/outputs for start, end in llm_pairs: print(start.payload['messages']) # full prompt print(end.payload['response']) # full completion print(end.time - start.time) # latency (seconds) # Full event list all_events = debug_handler.get_events() token_counts = debug_handler.get_llm_inputs_outputs()
Token usage
LlamaDebugHandler token_usage field per LLM event
Step latency
Event timestamps — end.time - start.time per event pair
Agent steps
CBEventType.AGENT_STEP events, one per ReAct iteration
Tool call args + returns
CBEventType.FUNCTION_CALL with full payload
Full raw LLM prompt
CBEventType.LLM event payload includes full prompt text
Retrieved documents
CBEventType.RETRIEVE events with doc content and scores
Zero-config enable
Requires 2 imports + 2 setup lines minimum — not 1-2 lines
Key advantage
Most expressive post-run query API of any framework. Typed CBEventType enum covers every LLM pipeline operation. get_event_pairs() returns structured start/end pairs with full payloads and timestamps.
Tradeoff
Settings.callback_manager is global state — affects all agents in the application. 4 lines to set up. CBEventType API has a learning curve vs SynapseKit's simpler tracer.spans list.
www.engineersofai.com · AI Letters #27 · LLM Showdown Notebook #19