11 docs tagged with "computer-vision"

CNN Architectures - AlexNet to ResNet, EfficientNet, and ConvNeXt

The full evolution of CNN architectures from handcrafted features to AlexNet, VGG, GoogLeNet, ResNet, EfficientNet, and ConvNeXt - with the engineering story behind every breakthrough.

Convolutional Neural Networks

From first principles - why CNNs exist, how the convolution operation works, weight sharing, hierarchical feature learning, receptive fields, 1x1 convolutions, and depthwise separable convolutions with PyTorch.

Data Augmentation

Theoretically-grounded data augmentation for computer vision - geometric and photometric transforms, CutMix, MixUp, AugMix, RandAugment, Albumentations, and Test-Time Augmentation in production.

Medical Imaging AI

Deep learning for radiology and pathology - CNN architectures, DICOM pipelines, transfer learning from ImageNet to medical domains, and clinical deployment considerations including FDA clearance.

Module 05 - Computer Vision

A comprehensive module on computer vision covering CNNs, modern architectures, object detection, segmentation, data augmentation, and Vision Transformers using PyTorch.

Object Detection: YOLO and R-CNN

Two-stage and one-stage object detection architectures - from sliding windows and R-CNN to Faster R-CNN, YOLO v8, FPN, anchor boxes, NMS, IoU, and mAP - with full PyTorch implementations.

Pooling, Strides, and Padding

Why spatial downsampling exists, how max pooling and strided convolutions compare, how padding controls output dimensions, receptive field growth, dilated convolutions, transposed convolutions, and when to use each - with PyTorch examples.

Semantic Segmentation

Pixel-wise classification with FCN, U-Net, DeepLab atrous convolutions, encoder-decoder architectures, instance segmentation with Mask R-CNN, and full PyTorch U-Net implementation.

Transfer Learning and Fine-Tuning

How pretrained ImageNet features transfer across domains, why it works, and the complete engineering playbook for fine-tuning in PyTorch - from feature extraction to progressive unfreezing with discriminative learning rates.

Vision Transformers (ViT)

How Vision Transformers apply self-attention to image patches - architecture, patch embeddings, positional encoding, DeiT, Swin Transformer, fine-tuning strategies, and production trade-offs against CNNs.

Visual Search and Product Discovery

Image embedding models for retail visual search, CLIP-based product discovery, FAISS similarity retrieval, multimodal search combining image and text, and the systems behind shop-the-look features.