Tracing Design Explorer

Three different answers to the same question: how do you see inside a running agent?

SynapseKit — Tracer Object

LangChain — Global Flags

LlamaIndex — Callback Manager

SynapseKit

TracingMiddleware — explicit object, programmatic access

7/7 features locally

Setup (7 lines)

from synapsekit import ReActAgent, CalculatorTool
from synapsekit.observability import TracingMiddleware

tracer = TracingMiddleware(verbose=True)
agent  = ReActAgent(llm=llm,
                    tools=[CalculatorTool()],
                    middleware=[tracer],
                    max_iterations=5)
result = await agent.run(query)
tracer.print_summary()

Post-run query

tracer.spans
# → [
#   TraceSpan(name=AGENT_START,  duration_ms=0.1,  metadata={query: ...}),
#   TraceSpan(name=LLM_CALL,     duration_ms=320,  metadata={tokens: 84}),
#   TraceSpan(name=TOOL_CALL,    duration_ms=2.3,  metadata={tool: calculator, result: 1024}),
#   TraceSpan(name=LLM_CALL,     duration_ms=280,  metadata={tokens: 41}),
#   TraceSpan(name=AGENT_FINISH, duration_ms=0.1,  metadata={total_tokens: 125}),
# ]

tracer.token_usage  # → {prompt: 125, completion: 84, total: 209}
tracer.total_latency_ms  # → 602.6

Feature coverage

✅

Token usage

tracer.token_usage — structured dict, queryable after run

✅

Step latency

TraceSpan.duration_ms on every event — no external service needed

✅

Agent steps

One TraceSpan per ReAct iteration, fully structured

✅

Tool call args + returns

TOOL_CALL spans include full args dict and return value

✅

Full raw LLM prompt

TracingMiddleware(level="debug") logs complete prompt text

✅

Retrieved documents

RETRIEVAL spans include doc content and relevance scores

✅

Zero-config enable

Tracer() + pass tracer= kwarg — no API keys or signups

Key advantage

The only design that avoids global state. Different agents in the same application can have different tracers. Unit testable — assert on specific spans directly in test code.

Tradeoff

Most lines to enable (7). The middleware=[tracer] wiring is explicit boilerplate you repeat for every agent.

LangChain

Global flags — minimum setup, side-effectful output

5/7 features locally

Setup (3 lines)

from langchain_core.globals import set_verbose, set_debug

set_verbose(True)   # shows chain I/O, tool calls, agent steps
set_debug(True)     # additionally shows full raw prompts sent to LLM
# All subsequent chain/agent/tool calls now emit structured logs.

Post-run query

# No query object — output goes to stderr
# To capture programmatically:
import io, contextlib
buf = io.StringIO()
with contextlib.redirect_stderr(buf):
    result = chain.invoke(query)
log_output = buf.getvalue()  # raw string — not structured

# For structured, queryable, per-run data → LangSmith required:
# LANGCHAIN_TRACING_V2=true
# LANGCHAIN_API_KEY=ls__...
# → full traces at smith.langchain.com

Feature coverage

⚠️

Token usage

Shown in verbose output as text — not in a queryable object. Full cost tracking needs LangSmith.

❌

Step latency

Not surfaced locally at all. Requires LangSmith for timing data.

✅

Agent steps

set_verbose shows Thought/Action/Observation cycles in output

✅

Tool call args + returns

Verbose output includes tool input and output text

✅

Full raw LLM prompt

set_debug(True) prints complete prompt/completion to stderr

✅

Retrieved documents

Verbose chain output includes retrieved document content

✅

Zero-config enable

set_verbose(True) — one line, no API keys, immediate output

Key advantage

Fewest lines of any framework (3). Works immediately on any existing LangChain agent. One-line debug mode is genuinely useful during development.

Key gap

No step latency locally. No structured query object — output is text to stderr. Global state means all LangChain objects in the process are affected. Full production observability requires LangSmith (external service).

LlamaIndex

CallbackManager — typed events, best post-run query API

6/7 features locally

Setup (4 lines)

from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings

debug_handler = LlamaDebugHandler(print_trace_on_end=True)
Settings.callback_manager = CallbackManager([debug_handler])

Post-run query

from llama_index.core.callbacks import CBEventType

# Query specific event types
llm_pairs    = debug_handler.get_event_pairs(CBEventType.LLM)
tool_pairs   = debug_handler.get_event_pairs(CBEventType.FUNCTION_CALL)
retrieval    = debug_handler.get_event_pairs(CBEventType.RETRIEVE)
agent_steps  = debug_handler.get_event_pairs(CBEventType.AGENT_STEP)

# LLM inputs/outputs
for start, end in llm_pairs:
    print(start.payload['messages'])   # full prompt
    print(end.payload['response'])     # full completion
    print(end.time - start.time)       # latency (seconds)

# Full event list
all_events   = debug_handler.get_events()
token_counts = debug_handler.get_llm_inputs_outputs()

Feature coverage

✅

Token usage

LlamaDebugHandler token_usage field per LLM event

✅

Step latency

Event timestamps — end.time - start.time per event pair

✅

Agent steps

CBEventType.AGENT_STEP events, one per ReAct iteration

✅

Tool call args + returns

CBEventType.FUNCTION_CALL with full payload

✅

Full raw LLM prompt

CBEventType.LLM event payload includes full prompt text

✅

Retrieved documents

CBEventType.RETRIEVE events with doc content and scores

❌

Zero-config enable

Requires 2 imports + 2 setup lines minimum — not 1-2 lines

Key advantage

Most expressive post-run query API of any framework. Typed CBEventType enum covers every LLM pipeline operation. get_event_pairs() returns structured start/end pairs with full payloads and timestamps.

Tradeoff

Settings.callback_manager is global state — affects all agents in the application. 4 lines to set up. CBEventType API has a learning curve vs SynapseKit's simpler tracer.spans list.

www.engineersofai.com · AI Letters #27 · LLM Showdown Notebook #19