Skip to main content

7 posts tagged with "LLM"

Large language models — internals, deployment, and production patterns.

View All Tags

AI Letters #27 - Agent Observability: 3 Lines Gets You In, But What Can You Actually See?

· 9 min read
EngineersOfAI
AI Engineering Education

"Three lines to enable tracing in LangChain. Zero lines of latency data when you're done."

Every agent fails eventually. A tool returns nothing. The LLM loops on the same thought. The retrieved documents are all wrong. What separates a two-minute debug from a two-hour one is not how the agent was built - it's how much you can see when it breaks.

Notebook #19 of the LLM Showdown measured one thing: how much can you observe about a running agent without leaving your local environment? No external service. No API key for a tracing platform. No paid tier. Just framework-native observability on the same machine where your code runs.

LangChain enables tracing in the fewest lines. What those lines actually surface is a different question.

Want to Think Like an AI Architect?

Join engineers receiving weekly breakdowns of AI systems, production failures, and architectural decisions.

I Built a Lightweight LLM Framework Because LangChain Frustrated Me - Here's What I Learned

· 15 min read
EngineersOfAI
AI Engineering Education

There's a moment every LLM developer knows. You've got a working prototype. It's elegant, fast, and does exactly what you need. Then you try to deploy it. And suddenly you're debugging a chain inside a runnable inside a callback inside an abstraction that didn't exist six months ago.

That moment happened one too many times. So something else got built.

This is the story of SynapseKit - why it exists, what it does differently, and what 18 (and counting) objective benchmarks against LangChain and LlamaIndex actually revealed.

Want to Think Like an AI Architect?

Join engineers receiving weekly breakdowns of AI systems, production failures, and architectural decisions.

AI Letters #26 - Multi-Agent Orchestration: 16 vs 19 vs 23 Lines (And Three Completely Different Mental Models)

· 9 min read
EngineersOfAI
AI Engineering Education

"Three frameworks, three different answers to the same question: who decides when one agent hands work to the next?"

A single agent with tools handles most tasks. But some workflows need specialisation - a researcher producing facts, a writer turning facts into prose, a reviewer checking the output. That chain of specialised agents is where the frameworks stop converging and start showing what they actually believe about software design.

Notebook #18 of the LLM Showdown measured the same 2-agent sequential pipeline across SynapseKit, LangChain (via LangGraph), and LlamaIndex. Researcher feeds Writer. Both call an LLM. The orchestrator wires them together. Simple enough that you can count the lines. Complex enough that the design philosophy underneath becomes visible.

The LoC numbers tell part of the story. The orchestration pattern matrix tells the rest.

Want to Think Like an AI Architect?

Join engineers receiving weekly breakdowns of AI systems, production failures, and architectural decisions.

AI Letters #25 - The Built-in Tool Race: 30 vs 29 vs 12 (And Why the Headline Number Lies)

· 7 min read
EngineersOfAI
AI Engineering Education

"Both SynapseKit and LangChain claim roughly 30 built-in tools. The difference is whether 'built-in' means 'works on install' or 'works after twelve more pip installs'."

Every LLM framework advertises its tool ecosystem. The numbers look impressive in the docs. Then you try to actually use them and discover that half of them require a separate pip install, a third require an API key, and a handful only work on specific operating systems.

Notebook #17 of the LLM Showdown did the audit nobody does in the benchmarks: count only what actually ships in the base install, then split by what works with zero configuration versus what needs extra setup. The headline totals are almost identical - 30, 29, 12. The zero-config counts are not.

Want to Think Like an AI Architect?

Join engineers receiving weekly breakdowns of AI systems, production failures, and architectural decisions.

AI Letters #24 - ReAct Agents: Six Lines vs Nineteen (And What You Lose in Between)

· 8 min read
EngineersOfAI
AI Engineering Education

"Six lines to build a working ReAct agent sounds like a win. It is - until your agent starts looping and you have no idea why."

The ReAct loop is the first pattern every engineer reaches for when they need an agent. Thought, Action, Observation. Repeat until done. It's elegant on paper. In production it breaks in exactly the ways you'd expect: infinite loops, wrong tool selection, hallucinated tool calls that return nothing useful.

The question isn't whether ReAct agents work. It's whether your framework lets you see inside the loop when things go wrong.

Notebook #15 of the LLM Showdown measured three things: lines of code to build a working ReAct agent with two tools, the built-in tool inventory available without writing any tool code, and loop control parameters exposed to the caller. SynapseKit wins on LoC. LangChain wins on observability. LlamaIndex sits in the middle on both. The numbers are not the story. The tradeoff they reveal is.

Want to Think Like an AI Architect?

Join engineers receiving weekly breakdowns of AI systems, production failures, and architectural decisions.

AI Letters #23 - The RAG Scorecard: Six Benchmarks, Three Frameworks, One Clear Pattern

· 8 min read
EngineersOfAI
AI Engineering Education

"Batteries-included beats fully-composable on conciseness every time. Fully-composable beats batteries-included on control every time. You just have to know which problem you're solving."

Six notebooks. Six benchmarks. Three frameworks measured on the same RAG workloads, back to back, reproducible on Kaggle.

Week 1 of the LLM Showdown covered setup overhead: environment spin-up, indexing speed, basic retrieval, reranking, evaluation harnesses, and the Week 1 scorecard. SynapseKit won that one 15–7–8 (SK–LC–LI).

Week 2 went deeper into the RAG stack: PDF ingestion, chunking strategies, BM25 availability, hybrid search RRF, streaming time-to-first-token, and conversation memory. Same methodology. 3-2-1 points for rank 1-2-3 across each benchmark, ties split.

The results are not a surprise if you've been paying attention. But the magnitude of the gap on some dimensions is.

Want to Think Like an AI Architect?

Join engineers receiving weekly breakdowns of AI systems, production failures, and architectural decisions.

Want to Think Like an AI Architect?

Join engineers receiving weekly breakdowns of AI systems, production failures, and architectural decisions.