Skip to main content

14 docs tagged with "systems-programming"

View all tags

Build Systems and CI/CD for ML

How build systems and CI/CD pipelines keep ML projects reproducible, tested, and safely deployable - covering Make, Bazel, DVC, MLflow, GitHub Actions, and canary deployments.

C and C++ for ML Systems

Learn why C and C++ form the foundation of every major ML framework, and how to read, write, and debug C++ code as an ML systems engineer.

Concurrency Primitives

Master mutexes, condition variables, atomics, lock-free programming, and thread pools - the concurrency building blocks behind every high-throughput ML data pipeline and inference server.

File Descriptors and I/O

File descriptors, the VFS layer, O_flags, select/poll/epoll, io module internals, and zero-copy techniques.

Infrastructure as Code for ML

IaC for ML infrastructure - Terraform GPU clusters on AWS/GCP/Azure, Helm charts for model serving, Pulumi Python IaC, Ansible for GPU node setup, GitOps with ArgoCD, spot instance handling, and infrastructure cost optimization.

Module 7: Systems Programming for ML Engineers

C++ basics for ML engineers, Python C extensions, Cython, Pybind11, and writing custom PyTorch operators - bridging the gap between Python ML code and high-performance native implementations.

Observability and Logging

Observability for ML systems - structured logging with structlog, distributed tracing with OpenTelemetry, Prometheus metrics for inference servers, Grafana dashboards, ML-specific alerting, and production profiling.

OS Primitives in Python

Processes, signals, environment, users, time, and the os/signal/resource modules - Python's interface to the POSIX operating system.

Serialization and Data Formats

Master serialization formats for ML systems - Protocol Buffers, Apache Arrow, safetensors, Parquet, HDF5, MessagePack, and pickle - with performance benchmarks, security considerations, and schema evolution strategies.

Shared Memory and IPC

POSIX shared memory, pipes, FIFOs, message queues, semaphores, and multiprocessing.shared_memory - Python inter-process communication.

Shell Scripting for ML Workflows

Bash scripting for ML engineers - automating training launches, multi-node coordination, GPU monitoring, checkpoint management, parallel data downloads, and writing robust production-grade shell scripts.

Sockets and Networking

BSD socket API in Python, TCP/UDP from scratch, non-blocking sockets, Unix domain sockets, and building a minimal HTTP server.

System Calls and Linux API

Learn how Linux system calls underpin every ML workload - from dataset loading with mmap to epoll-based inference servers, seccomp sandboxing, and io_uring async I/O.