Skip to main content

8 docs tagged with "long-context"

View all tags

Context Compression Techniques

How LLMLingua, AutoCompressors, GIST tokens, and selective compression reduce long contexts to fewer tokens while preserving the information needed to answer queries.

Lost in the Middle - How LLMs Use Long Contexts

The empirical finding that LLMs reliably recall information at the beginning and end of long contexts but miss information in the middle, and strategies to mitigate this U-shaped performance degradation.

Module 15 Overview - Long Context Strategies

How modern LLMs handle extremely long inputs - from the fundamental O(n²) attention problem to RoPE scaling, context compression, and production engineering for 128K+ context windows.

RAG vs Long Context - When to Use Each

A rigorous cost, latency, and accuracy comparison of retrieval-augmented generation versus long-context stuffing, with decision frameworks for production use cases.

RoPE and ALiBi - Positional Encoding for Long Context

How Rotary Position Embedding encodes relative positions through complex-plane rotations, why ALiBi achieves length extrapolation with linear biases, and why RoPE became the dominant approach for long-context models.

Working with 128K+ Context Windows in Production

A complete production engineering guide for building applications with long-context LLMs - model selection, cost management, prompt structure, multi-turn conversation, and memory-augmented systems.