Code and Math Specialized Models
How domain-specific pre-training and fine-tuning on code and math data produces models that outperform general LLMs on programming and reasoning tasks - and when to use them in production.
How domain-specific pre-training and fine-tuning on code and math data produces models that outperform general LLMs on programming and reasoning tasks - and when to use them in production.
Master the HuggingFace Hub as your primary interface for finding, evaluating, and deploying open-source models. Learn to read model cards, use the Hub API, and navigate 800k+ models efficiently.
A deep dive into Meta's LLaMA model family - from LLaMA 1 through LLaMA 3.3 - covering RoPE embeddings, SwiGLU activation, RMSNorm, grouped query attention, and when to choose each variant.
Mistral 7B's sliding window attention and grouped query attention innovations, and Mixtral 8x7B's Mixture of Experts design - sparse routing, expert selection, and why MoE delivers 70B quality at 13B active parameter cost.
Open-source model licenses are not all the same. Learn Apache 2.0, LLaMA Community, RAIL, and custom licenses - what you can and cannot do in production, and how to build a compliance workflow.
The open source LLM landscape - Llama, Mistral, Qwen, Gemma, Phi, model families, model cards, and a framework for choosing the right model for your task.
How open-source vision-language models work - from CLIP vision encoders and projection layers to LLaVA, InternVL2, and LLaMA 3.2 Vision - and how to deploy them for document understanding, OCR, and visual reasoning in production.
Microsoft Phi model family - textbook quality data hypothesis, how 1-4B models can match much larger ones on reasoning tasks, and the design principles behind efficient small language models.
Alibaba Qwen and DeepSeek architectural innovations - MLA attention, DeepSeekMoE, multi-token prediction, and how Chinese labs are advancing open-source LLM research.