5 docs tagged with "production-ai-patterns"

Async LLM Calls

Asynchronous LLM call patterns for high-throughput applications - concurrency control with semaphores, producer-consumer queues, token bucket rate limiting, circuit breakers, and async orchestration patterns.

Batch Processing with LLMs

Efficiently processing large document sets with LLM batch APIs - Anthropic Batch API, cost optimization, monitoring, checkpointing, and production patterns for overnight and large-scale LLM workloads.

Cost Optimization Patterns

Practical LLM cost reduction - semantic caching, model routing, prompt compression, Anthropic prompt caching, output length control, cost attribution, and monitoring for production AI systems.

Idempotency and Retries

Making LLM-powered workflows robust with idempotency keys, smart retries, distributed deduplication, workflow state persistence, and failure-tolerant pipeline design for production AI systems.

Streaming Responses

Implementing and optimizing streaming for real-time LLM response delivery - SSE, chunking strategies, backpressure, tool use streaming, and production patterns for perceived performance.