Skip to main content

9 docs tagged with "model-ecosystem"

View all tags

Code and Math Specialized Models

How domain-specific pre-training and fine-tuning on code and math data produces models that outperform general LLMs on programming and reasoning tasks - and when to use them in production.

HuggingFace Hub and Model Cards

Master the HuggingFace Hub as your primary interface for finding, evaluating, and deploying open-source models. Learn to read model cards, use the Hub API, and navigate 800k+ models efficiently.

LLaMA Family Architecture

A deep dive into Meta's LLaMA model family - from LLaMA 1 through LLaMA 3.3 - covering RoPE embeddings, SwiGLU activation, RMSNorm, grouped query attention, and when to choose each variant.

Mistral and Mixtral Architecture

Mistral 7B's sliding window attention and grouped query attention innovations, and Mixtral 8x7B's Mixture of Experts design - sparse routing, expert selection, and why MoE delivers 70B quality at 13B active parameter cost.

Model Licensing and Compliance

Open-source model licenses are not all the same. Learn Apache 2.0, LLaMA Community, RAIL, and custom licenses - what you can and cannot do in production, and how to build a compliance workflow.

Multimodal Open Source Models

How open-source vision-language models work - from CLIP vision encoders and projection layers to LLaVA, InternVL2, and LLaMA 3.2 Vision - and how to deploy them for document understanding, OCR, and visual reasoning in production.

Phi and Small Language Models

Microsoft Phi model family - textbook quality data hypothesis, how 1-4B models can match much larger ones on reasoning tasks, and the design principles behind efficient small language models.

Qwen, DeepSeek, and International Models

Alibaba Qwen and DeepSeek architectural innovations - MLA attention, DeepSeekMoE, multi-token prediction, and how Chinese labs are advancing open-source LLM research.