MLOps Problem List
Reading time: ~40 min | Interview relevance: Critical | Roles: MLOps Engineer, ML Platform Engineer, ML Infrastructure Engineer, Production ML Engineer
It is 2 AM and you get paged: a production model's latency spiked from 50ms to 800ms, recommendations are returning stale results, and the data pipeline feeding features has been silently failing for 6 hours. Your engineering manager asks you: "How do we make sure this never happens again?" If you can design the monitoring, alerting, and reliability infrastructure to answer that question, you are an MLOps engineer.
MLOps sits at the intersection of DevOps, data engineering, and machine learning. This list of 45 problems covers the full scope: CI/CD for ML, model serving, monitoring, pipeline orchestration, and infrastructure at scale.
MLOps Interview Structure
| Round | Duration | What They Test | Weight |
|---|---|---|---|
| System Design | 45-60 min | ML infrastructure architecture | 30-35% |
| Coding | 45-60 min | Python, infra scripting, pipeline code | 20-25% |
| ML Operations Knowledge | 45-60 min | Tooling, best practices, failure modes | 20-25% |
| DevOps / Cloud | 30-45 min | Kubernetes, Docker, cloud services, IaC | 15-20% |
| Behavioral | 30-45 min | Incident response, cross-team collaboration | 10% |
:::tip The MLOps Mindset MLOps is not about building the best model. It is about making sure the best model runs reliably, scales efficiently, and can be updated safely. Think like an SRE who specializes in ML systems. :::
Section 1: CI/CD for ML (8 Problems)
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 1 | Design a CI/CD Pipeline for ML Model Deployment | Medium | 35 min | Testing stages, validation gates, rollback strategy | The foundational MLOps system design problem | FAANG, Unicorns |
| 2 | Implement Automated Model Validation Tests | Medium | 25 min | Performance thresholds, data checks, regression tests | Models need testing just like software | All |
| 3 | Design a Canary Deployment Strategy for ML Models | Medium | 30 min | Traffic splitting, metric monitoring, auto-rollback | Safe production rollout of model changes | Google, Meta, Uber |
| 4 | Implement a Model Registry with Versioning | Medium | 30 min | Model metadata, lineage tracking, artifact storage | Track what is in production and how it got there | All |
| 5 | Design a Feature Branch Workflow for ML Experiments | Medium | 25 min | Experiment isolation, reproducibility, merge strategy | ML development needs version control discipline | FAANG, Unicorns |
| 6 | Implement Automated Data Validation in a Pipeline | Medium | 25 min | Schema validation, distribution checks, anomaly detection | Bad data is the #1 cause of ML system failures | Google, Uber, Databricks |
| 7 | Design a Blue-Green Deployment for Model Serving | Medium | 25 min | Zero-downtime deployment, instant rollback | Minimize risk during model updates | All |
| 8 | Build a Reproducibility System for ML Experiments | Hard | 35 min | Code versioning, data versioning, environment capture | "But it worked on my machine" is not acceptable | AI Labs, FAANG |
:::warning CI/CD for ML is Not Just CI/CD for Software ML pipelines have additional dimensions: data versioning, model artifact management, and performance-based validation gates. A model can pass all unit tests and still be a terrible model. Always include data validation and model performance checks. :::
Section 2: Model Serving & Inference (8 Problems)
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 9 | Design a Low-Latency Model Serving System | Hard | 40 min | Model optimization, batching, caching, load balancing | Serving is where MLOps meets users | FAANG, AI Labs |
| 10 | Implement Model A/B Testing Infrastructure | Medium | 30 min | Traffic routing, metric collection, statistical analysis | Every model change needs experimental validation | FAANG, Big Tech |
| 11 | Design a Multi-Model Serving Architecture | Hard | 35 min | Model registry integration, routing, resource isolation | Production systems serve many models simultaneously | Google, Meta, Amazon |
| 12 | Optimize Inference Latency by 10x | Hard | 35 min | Quantization, distillation, ONNX, TensorRT, batching | Latency optimization is a core MLOps skill | FAANG, AI Labs |
| 13 | Design an Auto-Scaling Strategy for Model Endpoints | Medium | 30 min | Horizontal scaling, request queuing, cold start mitigation | Traffic patterns are bursty; scaling must match | All |
| 14 | Implement a Feature Serving Layer with Consistent Online/Offline Features | Hard | 40 min | Feature store architecture, point-in-time correctness | Online/offline feature skew causes silent failures | Uber, Airbnb, Databricks |
| 15 | Design a GPU Resource Management System | Hard | 35 min | GPU sharing, scheduling, quota management | GPUs are expensive; efficient utilization matters | FAANG, AI Labs |
| 16 | Build a Model Caching Strategy | Medium | 25 min | Result caching, embedding caching, cache invalidation | Caching reduces cost and latency dramatically | All |
Section 3: Monitoring & Observability (8 Problems)
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 17 | Design an ML Model Monitoring Dashboard | Medium | 35 min | Metrics hierarchy, alerting thresholds, visualization | You cannot fix what you cannot see | All |
| 18 | Implement Data Drift Detection | Medium | 25 min | PSI, KS test, feature distribution monitoring | Data changes cause model degradation | All |
| 19 | Implement Model Performance Degradation Detection | Medium | 30 min | Delayed labels, proxy metrics, statistical process control | Catch model issues before users notice | All |
| 20 | Design an Alerting Strategy for ML Systems | Medium | 25 min | Alert hierarchy, severity levels, runbook integration | Too many alerts = alert fatigue = missed incidents | FAANG, Big Tech |
| 21 | Implement Prediction Logging and Audit Trail | Medium | 25 min | Structured logging, sampling strategy, storage optimization | Debugging production issues requires prediction history | All |
| 22 | Design a Root Cause Analysis System for Model Failures | Hard | 35 min | Dependency tracking, bisection, automated diagnostics | Fast RCA = fast recovery | Google, Meta, Uber |
| 23 | Monitor Fairness Metrics in Production | Hard | 30 min | Demographic parity, equalized odds, disparate impact | Fairness monitoring is a regulatory requirement | FAANG, Fintech |
| 24 | Design Cost Monitoring for ML Infrastructure | Medium | 25 min | Per-model cost attribution, budget alerts, optimization recommendations | ML infrastructure costs can spiral without visibility | All |
:::tip The Three Pillars of ML Monitoring
- Data quality -- Is the input data what the model expects?
- Model performance -- Is the model still making good predictions?
- System health -- Is the infrastructure performing reliably?
Most MLOps failures happen because teams monitor only system health and ignore data quality and model performance. :::
Section 4: Pipeline Orchestration & Data (7 Problems)
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 25 | Design an ML Training Pipeline with Airflow/Kubeflow | Medium | 35 min | DAG design, retry logic, dependency management | Orchestrated pipelines are the backbone of ML systems | All |
| 26 | Implement an Automated Retraining Pipeline | Medium | 30 min | Trigger mechanisms, data freshness, validation gates | Models need regular retraining to stay current | All |
| 27 | Design a Data Versioning Strategy for ML | Medium | 25 min | DVC, Delta Lake, snapshot management | Reproducibility requires data versioning | All |
| 28 | Handle Data Pipeline Failures Gracefully | Medium | 25 min | Retry logic, dead letter queues, backfill strategies | Pipelines fail; recovery must be automated | All |
| 29 | Design a Feature Computation Pipeline | Hard | 35 min | Batch vs. streaming features, backfill, consistency | Features are the lifeblood of ML models | Uber, Airbnb, Databricks |
| 30 | Implement Pipeline Idempotency and Exactly-Once Processing | Hard | 30 min | Idempotent operations, deduplication, checkpointing | Non-idempotent pipelines produce incorrect data | All |
| 31 | Design a Multi-Environment Pipeline (Dev/Staging/Prod) | Medium | 25 min | Environment parity, config management, promotion workflow | Code that works in dev must work in prod | All |
Section 5: Infrastructure & Scalability (8 Problems)
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 32 | Design a Distributed Training Infrastructure | Hard | 40 min | Data parallelism, model parallelism, communication patterns | Large models require distributed training | Google, Meta, AI Labs |
| 33 | Containerize an ML Application with Docker | Easy | 20 min | Dockerfile best practices, layer caching, multi-stage builds | Every ML deployment starts with containerization | All |
| 34 | Deploy ML Models on Kubernetes | Medium | 30 min | Deployment specs, resource limits, health checks, HPA | K8s is the standard ML deployment platform | All |
| 35 | Design an Infrastructure-as-Code Setup for ML | Medium | 30 min | Terraform/Pulumi, reproducible infrastructure, state management | Manual infrastructure does not scale | FAANG, Big Tech |
| 36 | Optimize Cloud Costs for ML Workloads | Medium | 30 min | Spot instances, right-sizing, reserved capacity, scheduling | ML infra is expensive; optimization is high-impact | All |
| 37 | Design a Secrets and Configuration Management System for ML | Medium | 25 min | Vault, environment variables, config injection, rotation | API keys, model endpoints, and feature flags need management | All |
| 38 | Implement Efficient Data Loading for Large-Scale Training | Medium | 25 min | Prefetching, parallel loading, data format optimization | Data loading is often the training bottleneck | Google, Meta, AI Labs |
| 39 | Design a Multi-Cloud ML Platform | Hard | 40 min | Portability, vendor lock-in avoidance, unified API | Business requirements sometimes mandate multi-cloud | Big Tech, Enterprise |
Section 6: Incident Response & Reliability (6 Problems)
| # | Problem | Difficulty | Time | Key Concept | Why It Matters | Company Tags |
|---|---|---|---|---|---|---|
| 40 | Triage a Production Model Outage | Medium | 25 min | Incident response, severity classification, communication | On-call response is a core MLOps responsibility | All |
| 41 | Design a Disaster Recovery Plan for ML Systems | Hard | 35 min | Backup strategies, RTO/RPO, failover, data recovery | Plan for the worst; hope for the best | FAANG, Big Tech |
| 42 | Implement Graceful Degradation for AI Features | Medium | 25 min | Fallback models, cached predictions, feature flags | AI features must fail gracefully, not catastrophically | All |
| 43 | Design an SLA Framework for ML Services | Medium | 25 min | Availability, latency, accuracy SLAs, error budgets | SLAs create accountability and alignment | FAANG, Big Tech |
| 44 | Conduct a Post-Incident Review for an ML System Failure | Medium | 25 min | Blameless retrospective, timeline, action items | Learning from failures prevents future incidents | All |
| 45 | Design a Chaos Engineering Strategy for ML Systems | Hard | 30 min | Failure injection, blast radius, steady-state verification | Test failure modes before they test you | Netflix, Google, Amazon |
:::danger The MLOps Reliability Hierarchy
- Can you deploy a model? (Table stakes)
- Can you roll back a bad deployment? (Essential)
- Can you detect when a model is degrading? (Differentiator)
- Can you automatically recover from failures? (Senior-level)
- Can you prevent failures before they happen? (Staff-level) :::
4-Week MLOps Study Plan
| Week | Focus | Problems | Daily Load |
|---|---|---|---|
| Week 1 | CI/CD + Serving | #1-16 | 2-3 problems/day |
| Week 2 | Monitoring + Pipelines | #17-31 | 2 problems/day |
| Week 3 | Infrastructure + Reliability | #32-45 | 2 problems/day |
| Week 4 | Integration + Mock | Full system designs | 1 deep design + review/day |
Week-by-Week Breakdown
Week 1: CI/CD and Serving
Day 1: #1, #2 (CI/CD pipeline, model validation)
Day 2: #3, #4 (canary deployment, model registry)
Day 3: #5, #6 (feature branches, data validation)
Day 4: #7, #8 (blue-green deployment, reproducibility)
Day 5: #9, #10 (low-latency serving, A/B testing infra)
Day 6: #11, #12 (multi-model serving, latency optimization)
Day 7: #13-16 (auto-scaling, feature serving, GPU management, caching)
Week 2: Monitoring and Pipelines
Day 1: #17, #18 (monitoring dashboard, data drift)
Day 2: #19, #20 (performance degradation, alerting)
Day 3: #21, #22 (prediction logging, root cause analysis)
Day 4: #23, #24 (fairness monitoring, cost monitoring)
Day 5: #25, #26 (training pipeline, automated retraining)
Day 6: #27, #28 (data versioning, pipeline failure handling)
Day 7: #29-31 (feature pipeline, idempotency, multi-environment)
MLOps Tooling Landscape
Know the major tools in each category:
| Category | Key Tools | When to Mention |
|---|---|---|
| Orchestration | Airflow, Kubeflow Pipelines, Prefect, Dagster | Pipeline design questions |
| Model Serving | TensorFlow Serving, Triton, Seldon, BentoML, vLLM | Inference and serving |
| Feature Store | Feast, Tecton, Hopsworks | Feature engineering at scale |
| Experiment Tracking | MLflow, Weights & Biases, Neptune | Reproducibility |
| Model Registry | MLflow, Vertex AI Model Registry, SageMaker | Model management |
| Data Validation | Great Expectations, TensorFlow Data Validation, Pandera | Data quality |
| Monitoring | Evidently, Whylabs, Arize, Fiddler | Model monitoring |
| Containerization | Docker, Kubernetes, Helm | Deployment |
| IaC | Terraform, Pulumi, CloudFormation | Infrastructure |
| CI/CD | GitHub Actions, GitLab CI, Jenkins, Argo CD | Automation |
:::note Tool Knowledge vs. Concepts Interviewers care more about concepts than specific tools. Saying "I would use a feature store with online and offline serving" is better than "I would use Feast." But knowing specific tools signals practical experience. :::
Problem Deep Dive: Design a CI/CD Pipeline for ML
This is the single most asked MLOps interview question. Here is a thorough answer framework:
Pipeline Stages
Key Design Decisions
| Decision | Options | Recommendation |
|---|---|---|
| Trigger for retraining | Scheduled vs. drift-based vs. manual | Start with scheduled, add drift-based as you mature |
| Validation threshold | Relative (beat prod by X%) vs. absolute | Both: must beat prod AND meet absolute minimums |
| Deployment strategy | Blue-green vs. canary vs. shadow | Canary for gradual rollout with rollback |
| Rollback trigger | Manual vs. automated | Automated for critical metrics, manual for nuanced cases |
Difficulty Distribution
| Difficulty | Problems | Count |
|---|---|---|
| Easy | #33 | 1 |
| Medium | #1, #2, #3, #4, #5, #6, #7, #10, #13, #16, #17, #18, #19, #20, #21, #24, #25, #26, #27, #28, #31, #34, #35, #36, #37, #38, #40, #42, #43, #44 | 30 |
| Hard | #8, #9, #11, #12, #14, #15, #22, #23, #29, #30, #32, #39, #41, #45 | 14 |
Next Steps
After completing the MLOps problem list:
- MLE Problems to strengthen ML fundamentals
- Data Engineer Problems for deeper data infrastructure skills
- Google-Style Problems since Google heavily tests infrastructure thinking
- Section 15: Role-Specific Prep for the full MLOps preparation path
