14 docs tagged with "llmops"

Dataset Curation for Fine-Tuning

How to build high-quality fine-tuning datasets - sourcing, deduplication, quality filtering, LLM-as-judge scoring, and a complete curation pipeline. Why 5K curated examples beat 500K raw ones.

Evaluation-Driven Development

Building AI systems test-first - write evals before writing prompts. The EDD loop, eval strategies, golden dataset construction, LLM-as-judge calibration, and a full EvalSuite implementation ready for CI integration.

Fine-Tuning Ops

Operationalize LLM fine-tuning at scale - data pipelines, LoRA adapter management, adapter registries, and serving 50 customer-specific adapters efficiently.

Fine-Tuning Pipelines

End-to-end fine-tuning pipeline engineering - from data collection and curation to training, evaluation, and deployment. When to fine-tune vs RAG vs prompt engineering, and how to build the pipeline that makes it repeatable and production-safe.

LLM CI/CD

CI/CD pipelines for LLM applications - handling non-deterministic outputs with LLM-judge gates, canary deployments with quality monitoring, automated rollback triggers, and full GitHub Actions implementation.

LLM Evaluation Pipelines

Build automated evaluation pipelines for LLM systems - LLM-as-judge, RAGAS for RAG systems, trajectory evaluation for agents, regression testing, and eval dataset curation.

LLMOps Platforms

Comprehensive guide to LLMOps platforms - LangSmith, Langfuse, W&B Weave, Arize Phoenix, Helicone, and PromptLayer. When to build vs buy, integration patterns, abstraction layers, and production-grade Python examples using the Anthropic SDK.

Module 01: LLMOps - Overview

An overview of LLMOps - the engineering discipline for building, shipping, and operating production LLM applications reliably and at scale.

Module 12 - LLMOps Pipelines

Operationalize LLM-based systems - prompt management, evaluation pipelines, observability, RAG operations, and fine-tuning infrastructure.

Prompt Management

Treat prompts as production artifacts - versioning, registry design, testing frameworks, A/B testing prompts, automated optimization with DSPy, and prompt governance.

Prompt Versioning

Treating prompts as first-class code artifacts - versioning, branching, review gates, A/B testing, and rollback for production LLM prompts. Build a complete prompt registry from scratch.

RAG Pipeline Ops

Operate RAG pipelines in production - index refresh strategies, chunk strategy updates, embedding drift detection, vector database monitoring, and quality tracking.

Token Cost Monitoring

Monitor and control LLM API costs in production - cost-per-request dashboards, budget alerts, token efficiency optimization, cost attribution by feature and user, and anomaly detection.

What is LLMOps

LLMOps defined - the operational discipline for managing LLM-powered applications in production, why it differs from MLOps, and the full lifecycle every AI engineering team must master.