Computer Vision for Quality Control

Reading time: ~45 min · Interview relevance: Very High · Target roles: ML Engineer, Computer Vision Engineer, Industrial AI Engineer

The Inspector Who Never Blinks

A printed circuit board manufacturer in Shenzhen runs three shifts, 24 hours a day, producing 80,000 PCBs per day. Each board has 1,200 solder joints. A human inspector, examining boards under magnification at six seconds per board, can check maybe 5,000 boards per shift. They catch roughly 85% of defects when fresh - that number drops to 65% in the third hour of a shift and to 50% near the end. Human attention is not designed for this task.

The defects they are looking for are often subtle. A cold solder joint - where the solder did not fully reflow and the connection is mechanically weak - might look visually identical to a good joint at low magnification. A tombstoned component, tilted vertically instead of flat, is obvious. Missing components are usually caught. But micro-cracks in solder joints, insufficient solder coverage, bridging between adjacent pads, and voids inside BGA balls (visible only with X-ray) push the limits of human perception even under ideal conditions.

The stakes are high. A PCB that passes visual inspection but has a marginal solder joint will work fine at room temperature. Under thermal cycling in the field - a car's engine bay goes from -30C to +100C - the marginal joint cracks. The PCB fails. The car goes to the dealership. The recall investigation traces back to the specific production batch. The PCB manufacturer faces a warranty claim, a customer audit, and potentially a lost contract.

This is the domain where computer vision for quality control has moved from academic curiosity to production necessity. Modern vision systems check every single PCB, every solder joint, at inspection rates that match production throughput - 0.3 seconds per board or faster. They never get tired. Their false negative rate on known defect patterns, once trained, is consistent across all three shifts. And unlike human inspectors, their decisions are logged, auditable, and improvable through structured feedback.

The challenge is that "known defect patterns" is doing a lot of work in that sentence. The variety of defects that actually occur in manufacturing is enormous and often unpredictable. A new production line, a new component from a new supplier, a slight change in reflow oven temperature profile - any of these can produce a novel defect type that the model has never seen. Building systems that are both sensitive to known defects and alert to unknown ones is the core technical problem.

Why This Exists

The Limits of Rule-Based Machine Vision

Industrial machine vision has existed since the 1980s. The technology - structured lighting, high-resolution cameras, geometric matching, threshold-based pixel classification - solved the easy inspection problems. Checking that a label is present on a bottle. Verifying that a cap is seated. Measuring the diameter of a machined part. These are solved problems with deterministic solutions.

The hard problems remained unsolved: detecting subtle surface defects on textured materials (woven fabric, brushed aluminum, cast metal), finding micro-cracks in ceramics, inspecting complex geometries where defects appear in unpredictable locations. Rule-based systems require an engineer to hand-code every defect pattern. If a new type of scratch appears on the production line - different angle, different width than the ones in the rule set - the rule misses it.

Deep learning changed this. Instead of coding rules, you show the model examples and let it learn its own rules. The key insight from the 2018-2022 period was that you could train excellent defect detectors with far fewer examples than anyone expected, especially when using anomaly detection approaches that require only normal samples for training.

Why Defect Datasets Are Painful to Build

In natural image classification, a "rare" class might have 1,000 examples. In manufacturing defect detection, a "common" defect type might have 50 examples in the entire production history. Defects are, by definition, rare events. A process with a 0.5% defect rate running at 10,000 parts per day produces 50 defective parts per day - sounds like a lot, but the defects are not uniformly distributed across types. Scratches might dominate. Cracks might appear twice a week.

More painful: collecting defect data requires production operators to identify, segregate, and photograph each defect as it occurs. This is labor-intensive and often deprioritized when the line is running at capacity. The practical implication: any industrial inspection system must work with limited defect data. This is why anomaly detection approaches - which train only on normal samples - are preferred as a starting point.

Historical Context

Machine vision for industrial inspection dates to the early work at MIT in the 1970s on structured light 3D reconstruction. The first commercial systems from companies like Cognex (founded 1981) and ISRA Vision used blob analysis, template matching, and geometric measurement. These systems required expert configuration but worked reliably within their narrow scope.

The watershed moment for AI-based inspection was the release of the MVTec Anomaly Detection (MVTec AD) dataset in 2019 by Paul Bergmann and colleagues at MVTec Software GmbH in Munich. MVTec AD contains 15 categories of industrial objects and textures (bottle, cable, capsule, carpet, grid, hazelnut, leather, metal nut, pill, screw, tile, toothbrush, transistor, wood, zipper), each with high-resolution images of normal samples and multiple defect types with pixel-level annotations. For the first time, researchers had a standardized benchmark for unsupervised anomaly detection on realistic industrial data.

The 2020-2022 period saw rapid progress. PaDiM (2020) showed that features from pretrained CNNs modeled with Gaussians achieved strong anomaly detection. PatchCore (2022) from Roth et al. became the new state of the art by aggregating patch-level features into a memory bank and using nearest-neighbor search for scoring. EfficientAD (2023) and FastFlow (2022) further improved inference speed while maintaining accuracy. The current frontier is multi-class anomaly detection (one model for all product types), zero-shot inspection using vision-language models, and 3D anomaly detection combining RGB with structured light depth data.

Core Concepts

The Defect Taxonomy

Manufacturing defects fall into several categories, each with different visual characteristics:

Surface defects - scratches, dents, pits, oxidation, contamination, discoloration. These alter the local appearance of the surface texture without changing the 3D shape.

Structural defects - cracks, breaks, missing material, delamination. These alter the physical integrity of the part.

Dimensional defects - the part is the right material and surface quality but outside dimensional tolerances. Requires measurement, not just classification.

Foreign object defects - contamination, incorrect components, debris. Requires detecting objects that should not be present.

Assembly defects - components missing, misaligned, or incorrectly oriented in an assembled product.

Each defect type requires a different inspection strategy. Surface defects are best detected with high-resolution 2D vision and carefully controlled lighting (dark field, bright field, coaxial). Structural defects may require multiple illumination angles or 3D depth information. Dimensional defects require calibrated metrology. Foreign objects require knowing what "should" be present and detecting deviations.

Inspection Architecture: Inline vs Offline

Inline inspection happens on the production line, at production speed. The camera and vision system are integrated into the conveyor or robot cell. Each part is imaged as it passes. The vision system has a fixed time budget - if the line runs at 20 parts per minute, the system has 3 seconds per part. The reject signal must be output within that window to activate the reject gate before the part moves on. This imposes strict latency constraints: inference must complete in 200-500 ms to leave time for I/O signaling.

Offline inspection happens in a dedicated inspection station, separate from the production line. Parts are sampled (not 100% inspection) or re-routed there for detailed analysis. There is no latency constraint, so you can run larger models, multiple passes, 3D scanning. The tradeoff: you only catch defects after the fact, and sampling means some defective parts get through.

The choice depends on defect rate, part value, and downstream consequences. For high-value parts (aerospace, medical devices), 100% inline inspection is mandatory. For commodity parts, sampling and offline AQL (Acceptable Quality Level) inspection is standard.

Anomaly Detection: The Unsupervised Approach

Anomaly detection for inspection rests on a simple premise: train a model to understand what normal looks like, then flag anything that deviates from normal. The advantage is that you do not need labeled defect examples - only a collection of normal (defect-free) samples.

PatchCore is the current best practice for production use. The algorithm:

Extract patch-level features from a pretrained CNN backbone (e.g., WideResNet-50) at an intermediate layer where features are both semantically rich and spatially localized.
Aggregate features across all training images into a "memory bank" of normal patch features.
Apply coreset subsampling to reduce the memory bank to a manageable size while preserving coverage.
At inference: extract patch features for the new image, find the nearest neighbor in the memory bank for each patch, use the distance as the anomaly score for that patch location.
The image-level anomaly score is the maximum patch anomaly score.
The patch anomaly scores form an anomaly map - the spatial localization of the defect.

The key insight: a pretrained backbone already knows how to describe image patches. PatchCore does not train a new model - it uses pretrained features directly. This is why it achieves strong results with zero defect examples and can be deployed quickly on new product types.

Supervised Defect Classification

When you have labeled defect examples (even a small number), supervised approaches can achieve higher precision on known defect types:

ResNet / EfficientNet for defect classification - standard image classification. Train on crops of normal and defective regions. Works well when defects are visually distinct and occur in predictable locations.

U-Net for defect segmentation - pixel-level segmentation of defect regions. Requires pixel-level annotations (expensive to create) but produces precise defect maps. Essential when you need to know not just whether a defect exists but exactly where and how large it is.

The hybrid approach used in production: PatchCore for catching unknown defects and providing localization, followed by a classifier on the flagged patches to categorize the defect type. PatchCore catches everything; the classifier provides actionable information to the maintenance team.

Code Examples

1. PatchCore Implementation for Anomaly Detection

"""
PatchCore - the current best practice for industrial anomaly detection.
Paper: "Towards Total Recall in Industrial Anomaly Detection" (Roth et al., 2022)

This implementation uses pretrained WideResNet features.
Install: pip install torch torchvision scikit-learn tqdm faiss-cpu
"""
import numpy as np
import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as T
from torch.utils.data import Dataset, DataLoader
from PIL import Image
from pathlib import Path
from typing import List, Tuple, Optional
from sklearn.random_projection import SparseRandomProjection
import faiss
from tqdm import tqdm


class PatchCoreFeatureExtractor(nn.Module):
    """
    Extract intermediate features from WideResNet-50.
    We use features from layer2 and layer3, which balance
    semantic richness with spatial resolution.
    """

    def __init__(self, backbone_name: str = "wide_resnet50_2"):
        super().__init__()
        backbone = getattr(models, backbone_name)(pretrained=True)

        # Features from layer2 (stride 8) and layer3 (stride 16)
        self.layer1 = nn.Sequential(backbone.conv1, backbone.bn1, backbone.relu,
                                     backbone.maxpool, backbone.layer1)
        self.layer2 = backbone.layer2
        self.layer3 = backbone.layer3

        # Freeze - used only as feature extractor, no fine-tuning
        for param in self.parameters():
            param.requires_grad = False

        self.eval()

    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
        """Returns (layer2_features, layer3_features)."""
        h = self.layer1(x)
        layer2_out = self.layer2(h)
        layer3_out = self.layer3(layer2_out)
        return layer2_out, layer3_out


def aggregate_patch_features(
    layer2: torch.Tensor,
    layer3: torch.Tensor,
    patch_size: int = 3
) -> Tuple[np.ndarray, Tuple[int, int]]:
    """
    Create neighborhood-aggregated patch features.

    Upsample layer3 to layer2 resolution and concatenate.
    Apply average pooling for local neighborhood aggregation.

    Returns: patch_features (H2*W2, C2+C3), spatial_shape (H2, W2)
    """
    B, C2, H2, W2 = layer2.shape

    # Aggregate neighborhood via average pooling
    l2_pooled = nn.functional.avg_pool2d(
        layer2, kernel_size=patch_size, stride=1, padding=patch_size//2
    )

    # Upsample layer3 to layer2 resolution, then pool
    l3_upsampled = nn.functional.interpolate(
        layer3, size=(H2, W2), mode="bilinear", align_corners=False
    )
    l3_pooled = nn.functional.avg_pool2d(
        l3_upsampled, kernel_size=patch_size, stride=1, padding=patch_size//2
    )

    # Concatenate along channel dimension
    combined = torch.cat([l2_pooled, l3_pooled], dim=1)  # (B, C2+C3, H2, W2)

    # Reshape to (B * H2 * W2, C2+C3) - one feature vector per spatial location
    combined = combined.permute(0, 2, 3, 1)  # (B, H2, W2, C2+C3)
    patch_features = combined.reshape(-1, combined.shape[-1])

    return patch_features.cpu().numpy(), (H2, W2)


class PatchCore:
    """
    Full PatchCore implementation with coreset subsampling and FAISS index.
    """

    def __init__(
        self,
        backbone: str = "wide_resnet50_2",
        coreset_sampling_ratio: float = 0.1,
        device: str = "cuda" if torch.cuda.is_available() else "cpu"
    ):
        self.device = device
        self.feature_extractor = PatchCoreFeatureExtractor(backbone).to(device)
        self.coreset_sampling_ratio = coreset_sampling_ratio
        self.memory_bank = None
        self.feature_dim = None
        self.spatial_shape = None
        self.index = None

        self.transform = T.Compose([
            T.Resize(256),
            T.CenterCrop(224),
            T.ToTensor(),
            T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        ])

    def _extract_features(
        self, images: List[Image.Image]
    ) -> Tuple[np.ndarray, Tuple[int, int]]:
        """Extract and aggregate patch features from a list of PIL images."""
        tensors = torch.stack([self.transform(img) for img in images]).to(self.device)

        with torch.no_grad():
            layer2, layer3 = self.feature_extractor(tensors)

        patch_features, spatial_shape = aggregate_patch_features(layer2, layer3)
        return patch_features, spatial_shape

    def fit(self, normal_images: List[Image.Image]):
        """
        Build memory bank from normal training images.

        normal_images: List of PIL Image objects, all defect-free
        """
        print(f"Building memory bank from {len(normal_images)} normal images...")

        all_features = []
        batch_size = 8

        for i in tqdm(range(0, len(normal_images), batch_size), desc="Extracting features"):
            batch = normal_images[i:i+batch_size]
            features, spatial_shape = self._extract_features(batch)
            all_features.append(features)

        all_features = np.concatenate(all_features, axis=0)
        self.spatial_shape = spatial_shape
        self.feature_dim = all_features.shape[1]
        print(f"Total patches: {len(all_features)}, Feature dim: {self.feature_dim}")

        # Coreset subsampling to reduce memory bank size
        n_coreset = max(1, int(len(all_features) * self.coreset_sampling_ratio))
        print(f"Subsampling to {n_coreset} coreset patches...")

        # Random projection speeds up distance computation for coreset selection
        projector = SparseRandomProjection(n_components=128, random_state=42)
        projected = projector.fit_transform(all_features).astype(np.float32)

        selected_indices = self._greedy_coreset(projected, n_coreset)
        self.memory_bank = all_features[selected_indices].astype(np.float32)

        # FAISS index for fast nearest-neighbor search
        self.index = faiss.IndexFlatL2(self.feature_dim)
        self.index.add(self.memory_bank)
        print(f"Memory bank built: {len(self.memory_bank)} patches")

    def _greedy_coreset(
        self, features: np.ndarray, n_select: int
    ) -> np.ndarray:
        """
        Greedy farthest-point sampling.
        Each selected point maximizes minimum distance to already-selected set.
        This ensures the coreset covers the feature space uniformly.
        """
        import numpy.random as npr
        n = len(features)
        selected = [npr.randint(0, n)]
        min_distances = np.full(n, np.inf)

        for _ in tqdm(range(n_select - 1), desc="Coreset sampling", leave=False):
            last = features[selected[-1]:selected[-1]+1]
            distances = np.linalg.norm(features - last, axis=1)
            min_distances = np.minimum(min_distances, distances)
            selected.append(int(np.argmax(min_distances)))

        return np.array(selected)

    def predict(
        self,
        image: Image.Image,
        return_anomaly_map: bool = True
    ) -> Tuple[float, Optional[np.ndarray]]:
        """
        Compute anomaly score for a single image.

        Returns:
            image_score: scalar, higher = more anomalous
            anomaly_map: H x W array of patch anomaly scores
        """
        features, (H, W) = self._extract_features([image])

        # Nearest neighbor distance in memory bank = anomaly score per patch
        distances, _ = self.index.search(features.astype(np.float32), k=1)
        patch_scores = distances[:, 0]

        # Image-level score: max over all patches (any defect counts)
        image_score = float(np.max(patch_scores))

        anomaly_map = None
        if return_anomaly_map:
            from scipy.ndimage import gaussian_filter
            score_map = patch_scores.reshape(H, W)
            # Gaussian smoothing for visualization
            anomaly_map = gaussian_filter(score_map, sigma=4)

        return image_score, anomaly_map

    def compute_threshold(
        self,
        normal_images: List[Image.Image],
        target_fpr: float = 0.05
    ) -> float:
        """
        Compute score threshold at target false positive rate
        by evaluating on held-out normal images.
        """
        scores = []
        for img in tqdm(normal_images, desc="Computing threshold"):
            score, _ = self.predict(img, return_anomaly_map=False)
            scores.append(score)

        threshold = np.percentile(scores, (1 - target_fpr) * 100)
        print(f"Threshold at {target_fpr*100:.1f}% FPR: {threshold:.4f}")
        return threshold

2. U-Net Defect Segmentation

"""
U-Net for semantic segmentation of defect regions.
Requires pixel-level annotations (binary masks: 0=normal, 1=defect).
Best used when you have 50+ annotated defect examples.
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset
from pathlib import Path
from PIL import Image
import numpy as np
import torchvision.transforms.functional as TF
import random


class DoubleConv(nn.Module):
    """Two convolution layers with BatchNorm and ReLU."""
    def __init__(self, in_ch: int, out_ch: int):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 3, padding=1, bias=False),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_ch, out_ch, 3, padding=1, bias=False),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.conv(x)


class UNet(nn.Module):
    """
    U-Net for binary defect segmentation.
    Input: (B, 3, H, W) RGB image
    Output: (B, 1, H, W) defect probability map (pre-sigmoid logits)
    """

    def __init__(self, in_channels: int = 3, base_features: int = 64):
        super().__init__()
        f = base_features

        # Encoder
        self.enc1 = DoubleConv(in_channels, f)
        self.enc2 = DoubleConv(f, f*2)
        self.enc3 = DoubleConv(f*2, f*4)
        self.enc4 = DoubleConv(f*4, f*8)
        self.pool = nn.MaxPool2d(2)

        # Bottleneck
        self.bottleneck = DoubleConv(f*8, f*16)

        # Decoder with skip connections
        self.up4 = nn.ConvTranspose2d(f*16, f*8, 2, stride=2)
        self.dec4 = DoubleConv(f*16, f*8)

        self.up3 = nn.ConvTranspose2d(f*8, f*4, 2, stride=2)
        self.dec3 = DoubleConv(f*8, f*4)

        self.up2 = nn.ConvTranspose2d(f*4, f*2, 2, stride=2)
        self.dec2 = DoubleConv(f*4, f*2)

        self.up1 = nn.ConvTranspose2d(f*2, f, 2, stride=2)
        self.dec1 = DoubleConv(f*2, f)

        self.out_conv = nn.Conv2d(f, 1, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        e1 = self.enc1(x)
        e2 = self.enc2(self.pool(e1))
        e3 = self.enc3(self.pool(e2))
        e4 = self.enc4(self.pool(e3))
        b = self.bottleneck(self.pool(e4))

        d4 = self.dec4(torch.cat([self.up4(b), e4], dim=1))
        d3 = self.dec3(torch.cat([self.up3(d4), e3], dim=1))
        d2 = self.dec2(torch.cat([self.up2(d3), e2], dim=1))
        d1 = self.dec1(torch.cat([self.up1(d2), e1], dim=1))

        return self.out_conv(d1)


class DefectSegmentationDataset(Dataset):
    """
    Dataset for defect segmentation training.

    Directory structure:
        images/  - RGB images of parts
        masks/   - Binary masks (0=normal, 255=defect), same filename as images
    """

    def __init__(
        self,
        image_dir: str,
        mask_dir: str,
        image_size: int = 512,
        augment: bool = True
    ):
        self.image_dir = Path(image_dir)
        self.mask_dir = Path(mask_dir)
        self.image_size = image_size
        self.augment = augment

        self.image_paths = sorted(self.image_dir.glob("*.png")) + \
                           sorted(self.image_dir.glob("*.jpg"))
        print(f"Found {len(self.image_paths)} images")

    def __len__(self):
        return len(self.image_paths)

    def _augment(
        self, image: Image.Image, mask: Image.Image
    ) -> Tuple[Image.Image, Image.Image]:
        """Synchronized augmentation - same transformation on image and mask."""
        if random.random() > 0.5:
            image = TF.hflip(image)
            mask = TF.hflip(mask)
        if random.random() > 0.5:
            image = TF.vflip(image)
            mask = TF.vflip(mask)
        if random.random() > 0.5:
            angle = random.uniform(-30, 30)
            image = TF.rotate(image, angle)
            mask = TF.rotate(mask, angle)
        # Color jitter - image only
        if random.random() > 0.5:
            image = TF.adjust_brightness(image, random.uniform(0.7, 1.3))
            image = TF.adjust_contrast(image, random.uniform(0.7, 1.3))
        return image, mask

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        mask_path = self.mask_dir / img_path.name

        image = Image.open(img_path).convert("RGB")
        mask = Image.open(mask_path).convert("L")

        image = image.resize((self.image_size, self.image_size), Image.BILINEAR)
        mask = mask.resize((self.image_size, self.image_size), Image.NEAREST)

        if self.augment:
            image, mask = self._augment(image, mask)

        image_tensor = TF.to_tensor(image)
        image_tensor = TF.normalize(image_tensor,
                                     mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])
        mask_tensor = torch.tensor(np.array(mask), dtype=torch.float32)
        mask_tensor = (mask_tensor > 128).float().unsqueeze(0)  # Binary

        return image_tensor, mask_tensor


def dice_loss(
    pred: torch.Tensor, target: torch.Tensor, smooth: float = 1.0
) -> torch.Tensor:
    """
    Dice loss - better than BCE for class-imbalanced segmentation.
    Defects are small regions; Dice directly optimizes the F1 overlap.
    """
    pred = torch.sigmoid(pred)
    pred_flat = pred.view(-1)
    target_flat = target.view(-1)
    intersection = (pred_flat * target_flat).sum()
    return 1 - (2 * intersection + smooth) / (
        pred_flat.sum() + target_flat.sum() + smooth
    )


def combined_segmentation_loss(
    pred: torch.Tensor, target: torch.Tensor
) -> torch.Tensor:
    """BCE + Dice. Standard for defect segmentation."""
    bce = F.binary_cross_entropy_with_logits(pred, target)
    dice = dice_loss(pred, target)
    return 0.5 * bce + 0.5 * dice

3. Synthetic Defect Generation with Albumentations

"""
Synthetic defect augmentation for rare defect types.

When you have very few real defect images, synthesize more
by pasting defect textures onto normal images.
This technique is standard practice in industrial inspection.
"""
import numpy as np
from PIL import Image
import cv2
import albumentations as A
from albumentations.pytorch import ToTensorV2
import random
from typing import Tuple


def create_scratch_mask(
    width: int,
    height: int,
    min_length: int = 50,
    max_length: int = 200,
    thickness: int = 2
) -> np.ndarray:
    """Generate a random scratch mask as a binary numpy array."""
    mask = np.zeros((height, width), dtype=np.uint8)

    x1 = random.randint(0, width)
    y1 = random.randint(0, height)
    angle = random.uniform(0, 360)
    length = random.randint(min_length, max_length)

    x2 = int(x1 + length * np.cos(np.radians(angle)))
    y2 = int(y1 + length * np.sin(np.radians(angle)))
    cv2.line(mask, (x1, y1), (x2, y2), 255, thickness)

    # Add slight curve for realism
    if random.random() > 0.5:
        mid_x = (x1 + x2) // 2 + random.randint(-20, 20)
        mid_y = (y1 + y2) // 2 + random.randint(-20, 20)
        pts = np.array([[x1, y1], [mid_x, mid_y], [x2, y2]], dtype=np.int32)
        cv2.polylines(mask, [pts], False, 255, thickness)

    return mask


def synthesize_scratch_defect(
    image: np.ndarray,
    darkness: Tuple[float, float] = (0.3, 0.7)
) -> Tuple[np.ndarray, np.ndarray]:
    """
    Add a synthetic scratch to an image.

    Returns: (defect_image, binary_mask)
    The mask can be used as the segmentation ground truth.
    """
    H, W = image.shape[:2]
    mask = create_scratch_mask(W, H)

    dark_factor = random.uniform(*darkness)
    defect_image = image.copy().astype(np.float32)
    defect_image[mask > 0] *= dark_factor
    defect_image = np.clip(defect_image, 0, 255).astype(np.uint8)

    result = image.copy()
    result[mask > 0] = defect_image[mask > 0]

    return result, (mask > 0).astype(np.uint8)


def get_training_augmentation(image_size: int = 512) -> A.Compose:
    """
    Albumentations pipeline for manufacturing inspection training.

    Design:
    - No heavy geometric distortions (parts have fixed orientation on conveyor)
    - Moderate color jitter (lighting is controlled but not perfect)
    - Industrial noise patterns (sensor noise, compression artifacts)
    """
    return A.Compose([
        A.RandomResizedCrop(
            height=image_size, width=image_size,
            scale=(0.8, 1.0), ratio=(0.9, 1.1)
        ),
        A.HorizontalFlip(p=0.5),
        A.VerticalFlip(p=0.2),
        A.RandomRotate90(p=0.3),

        # Lighting variations
        A.OneOf([
            A.RandomBrightnessContrast(
                brightness_limit=0.2, contrast_limit=0.2
            ),
            A.RandomGamma(gamma_limit=(80, 120)),
            A.HueSaturationValue(
                hue_shift_limit=5, sat_shift_limit=20, val_shift_limit=20
            ),
        ], p=0.7),

        # Blur and noise - simulates sensor variation across cameras
        A.OneOf([
            A.GaussianBlur(blur_limit=(3, 5)),
            A.MotionBlur(blur_limit=5),
            A.GaussNoise(var_limit=(10, 50)),
        ], p=0.4),

        A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        ToTensorV2()
    ])

4. Threshold Calibration for Pass/Fail Decisions

"""
Calibrating pass/fail thresholds for visual inspection.
The threshold is a business decision, not just a statistical one.
Different defect types have different consequences.
"""
import numpy as np
from sklearn.metrics import (
    precision_recall_curve, roc_auc_score, f1_score, confusion_matrix
)
from typing import Dict


def find_optimal_threshold(
    scores: np.ndarray,
    labels: np.ndarray,
    cost_false_positive: float = 1.0,
    cost_false_negative: float = 50.0,
    verbose: bool = True
) -> Dict:
    """
    Find optimal threshold based on asymmetric cost function.

    In quality control:
    - False positive = good part rejected (lose revenue, rework cost)
    - False negative = bad part shipped (field failure, warranty, safety risk)

    The optimal threshold minimizes: FP * cost_fp + FN * cost_fn

    Args:
        scores: Anomaly scores, higher = more likely defective
        labels: Binary labels (1=defective, 0=normal)
        cost_false_positive: Relative cost of rejecting a good part
        cost_false_negative: Relative cost of shipping a bad part
    """
    precision, recall, thresholds = precision_recall_curve(labels, scores)

    n_pos = np.sum(labels == 1)
    n_neg = np.sum(labels == 0)

    costs = []
    for thresh in thresholds:
        predicted = (scores >= thresh).astype(int)
        fp = np.sum((predicted == 1) & (labels == 0))
        fn = np.sum((predicted == 0) & (labels == 1))
        costs.append(fp * cost_false_positive + fn * cost_false_negative)

    best_idx = int(np.argmin(costs))
    best_threshold = float(thresholds[best_idx])

    predicted = (scores >= best_threshold).astype(int)
    tn, fp, fn, tp = confusion_matrix(labels, predicted).ravel()

    metrics = {
        "threshold": best_threshold,
        "precision": tp / (tp + fp + 1e-10),
        "recall": tp / (tp + fn + 1e-10),
        "f1": f1_score(labels, predicted),
        "false_positive_rate": fp / (fp + tn + 1e-10),
        "false_negative_rate": fn / (fn + tp + 1e-10),
        "auc_roc": roc_auc_score(labels, scores),
        "expected_cost": costs[best_idx],
        "tp": int(tp), "fp": int(fp), "fn": int(fn), "tn": int(tn)
    }

    if verbose:
        print("\n--- Optimal Threshold Results ---")
        print(f"Threshold:          {metrics['threshold']:.4f}")
        print(f"Precision:          {metrics['precision']:.3f}  "
              f"({tp}/{tp+fp} alerts are real defects)")
        print(f"Recall:             {metrics['recall']:.3f}  "
              f"({tp}/{tp+fn} defects caught)")
        print(f"False Positive Rate:{metrics['false_positive_rate']:.3f}  "
              f"({fp} good parts rejected)")
        print(f"False Negative Rate:{metrics['false_negative_rate']:.3f}  "
              f"({fn} defects shipped)")
        print(f"ROC AUC:            {metrics['auc_roc']:.3f}")
        print(f"Expected Cost:      {metrics['expected_cost']:.1f} units")

    return metrics

5. Camera Trigger Integration with PLC

"""
Camera trigger integration pattern.
In inline inspection, the camera fires on a hardware trigger
from an encoder or photoelectric sensor, not on a timer.
This ensures the image is captured at the exact moment the part
is centered under the camera.
"""
import time
import threading
import queue
from typing import Callable, Optional
import logging

logger = logging.getLogger(__name__)


class InlineInspectionPipeline:
    """
    Production-grade inline inspection pipeline.

    Architecture:
    - Thread 1 (acquisition): waits for hardware trigger, grabs frame
    - Thread 2 (inference): runs model on grabbed frames
    - Thread 3 (I/O): sends pass/fail signal to PLC
    - All threads communicate via queues for decoupling

    This pipeline processes part N during acquisition of part N+1.
    At 20 parts/min with 200ms inference, this is well within budget.
    """

    def __init__(
        self,
        inference_fn: Callable,      # model.predict(image) -> score
        threshold: float,
        plc_reject_fn: Callable,     # function to trigger reject gate
        max_queue_depth: int = 4
    ):
        self.inference_fn = inference_fn
        self.threshold = threshold
        self.plc_reject_fn = plc_reject_fn

        self.image_queue = queue.Queue(maxsize=max_queue_depth)
        self.result_queue = queue.Queue(maxsize=max_queue_depth)

        self._running = False
        self._threads = []

        # Metrics
        self.parts_inspected = 0
        self.parts_rejected = 0
        self.inference_times_ms = []

    def _acquisition_thread(self, camera):
        """
        Thread 1: acquires images on hardware trigger.
        camera: object with grab_frame() method called after trigger
        """
        logger.info("Acquisition thread started")
        while self._running:
            try:
                # Wait for hardware trigger (blocking call in most SDKs)
                image = camera.grab_frame(timeout_ms=5000)
                if image is not None:
                    part_id = int(time.time() * 1000)
                    self.image_queue.put((part_id, image), block=False)
            except queue.Full:
                logger.warning("Image queue full - dropping frame. "
                               "Inference may be too slow for line speed.")
            except Exception as e:
                logger.error(f"Acquisition error: {e}")

    def _inference_thread(self):
        """Thread 2: runs ML model on queued images."""
        logger.info("Inference thread started")
        while self._running:
            try:
                part_id, image = self.image_queue.get(timeout=1.0)
                t0 = time.perf_counter()

                score, anomaly_map = self.inference_fn(image)
                decision = "REJECT" if score > self.threshold else "PASS"

                elapsed_ms = (time.perf_counter() - t0) * 1000
                self.inference_times_ms.append(elapsed_ms)

                self.result_queue.put((part_id, decision, score, anomaly_map))

            except queue.Empty:
                continue
            except Exception as e:
                logger.error(f"Inference error: {e}")

    def _io_thread(self):
        """Thread 3: sends reject signal to PLC, logs results."""
        logger.info("I/O thread started")
        while self._running:
            try:
                part_id, decision, score, anomaly_map = self.result_queue.get(
                    timeout=1.0
                )

                self.parts_inspected += 1

                if decision == "REJECT":
                    self.parts_rejected += 1
                    self.plc_reject_fn(part_id)
                    logger.info(f"REJECT part {part_id}: score={score:.4f}")
                else:
                    logger.debug(f"PASS part {part_id}: score={score:.4f}")

                # Log to database (async, non-blocking)
                self._log_result_async(part_id, decision, score)

            except queue.Empty:
                continue

    def _log_result_async(self, part_id, decision, score):
        """Non-blocking database logging - never block the I/O thread."""
        def _write():
            pass  # Replace with actual DB write
        threading.Thread(target=_write, daemon=True).start()

    def start(self, camera):
        """Start all pipeline threads."""
        self._running = True
        self._threads = [
            threading.Thread(target=self._acquisition_thread, args=(camera,), daemon=True),
            threading.Thread(target=self._inference_thread, daemon=True),
            threading.Thread(target=self._io_thread, daemon=True),
        ]
        for t in self._threads:
            t.start()
        logger.info("Inspection pipeline started")

    def stop(self):
        """Graceful shutdown."""
        self._running = False
        for t in self._threads:
            t.join(timeout=5.0)
        logger.info(
            f"Pipeline stopped. Inspected: {self.parts_inspected}, "
            f"Rejected: {self.parts_rejected}, "
            f"Reject rate: {self.parts_rejected/max(1,self.parts_inspected)*100:.2f}%"
        )
        if self.inference_times_ms:
            times = np.array(self.inference_times_ms)
            logger.info(
                f"Inference latency - mean: {times.mean():.1f}ms, "
                f"p95: {np.percentile(times, 95):.1f}ms, "
                f"max: {times.max():.1f}ms"
            )

System Architecture

Production Engineering Notes

Camera and Lighting Configuration

The ML model is only as good as the images it receives. Controlled lighting is not optional - it is the foundation. Dark field illumination (light at low angle, camera above) makes surface scratches and bumps appear bright against a dark background - excellent for surface defects on smooth materials. Bright field (coaxial, on-axis) illuminates uniformly - good for color and print defects. Structured light (projected fringe pattern) enables 3D height maps for detecting protrusions and depressions.

Depth of field matters enormously. If your part has Z-height variation of 10mm and your lens has depth of field of 5mm, the edges of the part will be blurry. Use telecentric lenses for parts where dimensional accuracy matters - telecentric optics eliminate perspective distortion and provide consistent magnification across the field.

Trigger synchronization is safety-critical. The camera must fire at exactly the right moment - when the part is stationary or at a known speed and position. A late trigger means a blurred image. An early trigger means the part has not fully entered the frame. Use an encoder-based trigger rather than a timer-based trigger (timers drift with line speed variations).

Throughput and Latency Requirements

Inline inspection has a hard latency requirement: the reject signal must reach the PLC before the part reaches the reject gate. If the gate is 2 meters downstream of the camera at a conveyor speed of 1 m/s, you have 2 seconds. Subtract the PLC cycle time (10-50ms) and the actuator travel time (100-300ms). In practice, you have 1-1.5 seconds for image acquisition plus inference plus I/O.

This means your model must run in under 500ms at most, ideally under 200ms. PatchCore on a 512x512 image with WideResNet-50 backbone runs in approximately 50-100ms on an NVIDIA T4. A U-Net inference is typically 30-80ms. Both fit comfortably in the latency budget for most inline applications.

At higher throughput (120 parts per minute = 1 part every 500ms), implement a pipeline: camera captures part N while GPU is inferring on part N-1. This doubles throughput with no hardware change - implement it with a multithreaded acquisition-inference queue.

Model Drift and Retraining

Production environments change. A new paint supplier changes the surface texture. Lighting LEDs age and their spectrum shifts. A different batch of raw material has a slightly different baseline appearance. Any of these can cause a model trained on historical data to generate increasing false positives on the new "normal."

Monitor model drift by tracking the distribution of anomaly scores on pass decisions over time. If the mean score on pass decisions starts rising week over week, the model is seeing the new normal as slightly anomalous. This is your early warning to collect new normal images and retrain. Implement a sliding window retraining policy: rebuild the PatchCore memory bank every month using the most recent 4 weeks of confirmed-normal images.

:::warning Defect Class Imbalance In production image datasets, defect examples are rare - often less than 1% of all images. If you train a supervised classifier without addressing this imbalance, the model will learn to predict "normal" for everything and achieve 99% accuracy while catching zero defects. Use focal loss, weighted cross-entropy, or oversample the defect class. Always report precision and recall by class, never just overall accuracy. :::

:::danger Overfitting to Known Defect Patterns A supervised model trained on your historical defect library will achieve excellent performance on defect types it has seen. It will miss novel defect types completely. New defect types appear whenever the production process changes: new supplier material, tooling wear, environmental changes. Always run an anomaly detection model (PatchCore or similar) in parallel with your supervised classifier. The anomaly detector catches what the classifier misses. :::

Interview Questions and Answers

Q1: Why do anomaly detection methods like PatchCore outperform supervised classifiers on the MVTec benchmark?

The MVTec AD benchmark evaluates detection of unseen defect types - defects never shown during training. Supervised classifiers fail on unseen defect types by definition: they can only classify into categories they were trained on. PatchCore succeeds because it models what normal looks like and flags deviations - it does not need to know what specific defects look like. Additionally, PatchCore benefits from pretrained ImageNet features that already encode rich texture and shape representations, without requiring any fine-tuning on defect examples. The practical implication: start with anomaly detection to achieve broad coverage of all defect types, then add supervised classification for known types where you need categorization for routing or process feedback.

Q2: How do you handle lighting variation between production batches in a vision inspection system?

Lighting variation is one of the most common reasons inspection systems fail in production. Approaches at different levels. At the hardware level: use LED controllers that maintain constant lumen output over temperature and age, measure ambient light and compensate, use polarized lighting to eliminate specular variation. At the image processing level: apply histogram equalization or CLAHE to normalize the intensity distribution before inference. At the model level: augment training data with brightness and contrast variations. At the monitoring level: track the mean intensity of passing images over time - a systematic drift in mean intensity is a lighting system warning. I prefer hardware-level control first, model-level augmentation second, and monitoring third.

Q3: What is the difference between an anomaly map and a defect segmentation map?

An anomaly map (from PatchCore or similar) is a heat map showing where the image deviates most from training normal images. Each location's value is the distance to the nearest normal patch in feature space - a relative measure without semantic meaning. A defect segmentation map is a binary or multi-class mask where each pixel is classified as a specific defect type (scratch, crack, void). The anomaly map is produced without any defect labels; the segmentation map requires pixel-level annotations. In production, use the anomaly map for initial detection and pass/fail, and the segmentation map for detailed defect characterization - size, location, type - logged for process control.

Q4: A new product type is introduced on the production line. How quickly can you deploy inspection capability?

With PatchCore, you can have a working anomaly detector in one day. The workflow: run the new product through the line for 2-4 hours, collecting 200-500 images of normal parts with consistent lighting. Train PatchCore on these normal images (30 minutes compute). Evaluate on a test set and set the threshold. Deploy to the production vision PC. No defect examples are needed at all. The model catches deviations from normal, which includes all defect types even though it has never seen any of them. For a supervised classifier you need weeks of defect data collection - PatchCore is the right starting point for new products.

Q5: How do you validate that your inspection model is working correctly in production?

Multiple validation layers. First, an online holdout: route 1% of production parts to manual inspection regardless of the model's decision - this measures recall directly. Second, a false positive audit: for every rejected part, a technician confirms whether the rejection was valid - this measures precision. Third, monitor the score distribution of passing parts: if it shifts upward, the model or production conditions are changing. Fourth, periodic challenge samples: introduce known-defective reference parts into the line and verify the model catches them. Fifth, monitor model-level metrics: inference time (a GPU memory issue causes it to spike), input image quality (check that mean brightness and contrast are stable). Weekly review of these metrics, with automated alerts when any metric goes out of range.

Key Takeaways

Visual inspection with AI changes the economics of quality control: 100% inspection at production speed, consistent performance across shifts, and the ability to catch subtle defects that evade human inspection. The technical stack centers on anomaly detection (PatchCore for zero-shot new products) plus supervised segmentation (U-Net for known defect types requiring characterization). The hard problems are lighting control, latency (under 500ms end-to-end for inline use), threshold calibration (balancing false rejects against defect escapes), and model drift as production conditions change. Build the feedback loop from day one - every confirmed defect and every confirmed false positive is training data for the next model version.

The Inspector Who Never Blinks​

Why This Exists​

The Limits of Rule-Based Machine Vision​

Why Defect Datasets Are Painful to Build​

Historical Context​

Core Concepts​

The Defect Taxonomy​

Inspection Architecture: Inline vs Offline​

Anomaly Detection: The Unsupervised Approach​

Supervised Defect Classification​

Code Examples​

1. PatchCore Implementation for Anomaly Detection​

2. U-Net Defect Segmentation​

3. Synthetic Defect Generation with Albumentations​

4. Threshold Calibration for Pass/Fail Decisions​

5. Camera Trigger Integration with PLC​

System Architecture​

Production Engineering Notes​

Camera and Lighting Configuration​

Throughput and Latency Requirements​

Model Drift and Retraining​

Interview Questions and Answers​

Key Takeaways​