Skip to main content

GitLab CI for ML

The Enterprise ML Team's Starting Point

DataOps lead Sofia joined an insurance company's ML platform team to find that "CI/CD" meant something very specific: a Jenkins pipeline that ran flake8 on Python files, ran pytest, and declared victory. The actual model training was done by data scientists manually, on their laptops, with results shared via Slack. Deploying a model update meant a data scientist emailing the DevOps team with a model file attached.

The company was processing 40,000 insurance claims per day through three ML models: fraud detection, damage assessment, and settlement amount prediction. All three models were trained manually, infrequently (the fraud model had not been retrained in 11 months), and deployed through a ticket-based process that took 2-3 weeks. Model performance was declining. By the time a model quality problem showed up in business metrics, it had been happening for weeks.

Sofia's mandate: build an automated ML CI/CD pipeline. The company ran entirely on GitLab (on-premise GitLab instance, not GitLab.com - common in regulated industries). The pipeline needed to go from a data commit or code change all the way to a staged production deployment, with automated quality gates and a human approval step before production.

This lesson builds that pipeline. Every configuration block here is production-tested against real GitLab instances.

:::tip 🎮 Interactive Playground Visualize this concept: Try the CI/CD Pipeline for ML demo on the EngineersOfAI Playground - no code required. :::

Why This Exists

GitLab CI was built with enterprise features that GitHub Actions has only recently started to match: environments (with manual approval gates), DAG pipeline execution (jobs run as soon as dependencies are met, not in sequential stages), and first-class support for scheduled pipelines. For on-premise deployments and regulated industries, GitLab also offers more control over the runner infrastructure.

GitLab's approach to ML CI/CD is through its Auto DevOps philosophy extended with ML-specific patterns. The GitLab model registry (added in 2023) integrates directly with the CI pipeline. But even without it, the combination of .gitlab-ci.yml DAG pipelines, artifacts, environments, and scheduled triggers provides everything needed for a complete ML CI/CD system.

GitLab CI Architecture for ML

The Complete .gitlab-ci.yml

# .gitlab-ci.yml
# ML CI/CD pipeline: validate → train → evaluate → register → deploy

image: python:3.11-slim

# ─────────────────────────────────────────────────────────────────────────────
# Global variables - override with CI/CD variables in GitLab UI
# ─────────────────────────────────────────────────────────────────────────────
variables:
MLFLOW_TRACKING_URI: $MLFLOW_TRACKING_URI_SECRET
MODEL_NAME: fraud-detector
PYTHON_VERSION: "3.11"
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
# Training data location (S3 or internal object store)
TRAINING_DATA_PATH: s3://ml-data-prod/fraud/latest/train.parquet
EVAL_DATA_PATH: s3://ml-data-prod/fraud/eval/eval_set_v3.parquet

# Cache pip dependencies across jobs
cache:
key:
files:
- requirements.txt
prefix: pip-v1
paths:
- .cache/pip/

stages:
- validate
- train
- evaluate
- register
- deploy-staging
- deploy-production

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 1: validate
# ─────────────────────────────────────────────────────────────────────────────

data-validation:
stage: validate
script:
- pip install -r requirements.txt -q
- python scripts/validate_data.py --data-path "$TRAINING_DATA_PATH"
# Only run if training data or pipeline code changed
rules:
- changes:
- data/**/*
- src/pipeline/**/*
- src/features/**/*
- scripts/validate_data.py
artifacts:
paths:
- validation_report.json
expire_in: 1 week
when: always # Upload even on failure for debugging

code-lint:
stage: validate
script:
- pip install ruff mypy -q
- ruff check src/ tests/
- mypy src/ --ignore-missing-imports
rules:
- when: always # Always lint

unit-tests:
stage: validate
script:
- pip install -r requirements-dev.txt -q
- pytest tests/unit/ -v --tb=short --timeout=60 --junitxml=test-results/unit.xml
artifacts:
reports:
junit: test-results/unit.xml
paths:
- test-results/
expire_in: 1 week
rules:
- when: always

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 2: train (GPU runner)
# ─────────────────────────────────────────────────────────────────────────────

model-training:
stage: train
# GPU runner - registered with tag 'gpu'
tags:
- gpu
- linux
image: nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
timeout: 3 hours
variables:
MLFLOW_EXPERIMENT_NAME: "fraud-detector-ci"
MLFLOW_RUN_NAME: "ci-$CI_COMMIT_SHORT_SHA-$CI_PIPELINE_ID"
before_script:
# GPU runner uses a different base image - install Python and deps
- apt-get update -q && apt-get install -y -q python3.11 python3-pip
- pip3 install -r requirements.txt -q
script:
- python3 src/training/train.py
--data-path "$TRAINING_DATA_PATH"
--output-dir artifacts/model
--config config/training.yaml
--run-name "$MLFLOW_RUN_NAME"
# Verify model artifact was created
- test -f artifacts/model/model.joblib || (echo "ERROR: model file not created" && exit 1)
# Save MLflow run ID for downstream stages
- cat artifacts/model/mlflow_run_id.txt
artifacts:
paths:
- artifacts/model/
expire_in: 2 weeks
name: "model-$CI_COMMIT_SHORT_SHA"
# Training only runs when training code or data changes
rules:
- changes:
- src/training/**/*
- src/features/**/*
- src/models/**/*
- config/training*.yaml
- data/**/*
# Always run on main branch merges regardless of path
- if: $CI_COMMIT_BRANCH == "main"
# Allow manual trigger
- when: manual
allow_failure: false

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 3: evaluate
# ─────────────────────────────────────────────────────────────────────────────

model-evaluation:
stage: evaluate
needs:
- job: model-training
artifacts: true
script:
- pip install -r requirements.txt -q
- python src/evaluation/evaluate.py
--model-path artifacts/model/model.joblib
--eval-data-path "$EVAL_DATA_PATH"
--output-path artifacts/evaluation_results.json
- cat artifacts/evaluation_results.json
artifacts:
paths:
- artifacts/evaluation_results.json
reports:
# GitLab can parse metrics for trend visualization
metrics: artifacts/evaluation_results.json
expire_in: 2 weeks

performance-gate:
stage: evaluate
needs:
- job: model-evaluation
artifacts: true
- job: model-training
artifacts: true
script:
- pip install -r requirements.txt -q
# Fetch baseline metrics from model registry (current production model)
- python scripts/fetch_baseline_metrics.py
--model-name "$MODEL_NAME"
--stage Production
--output-path artifacts/baseline_metrics.json
# Run gate check - exits 1 if gate fails
- python scripts/check_gate.py
--new-metrics artifacts/evaluation_results.json
--baseline-metrics artifacts/baseline_metrics.json
--min-auc 0.90
--max-regression 0.01
--subgroup-max-regression 0.03
artifacts:
paths:
- artifacts/baseline_metrics.json
expire_in: 1 week
when: always

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 4: register
# ─────────────────────────────────────────────────────────────────────────────

register-model:
stage: register
needs:
- job: performance-gate
- job: model-training
artifacts: true
- job: model-evaluation
artifacts: true
# Only register on main branch (not on MR pipelines)
rules:
- if: $CI_COMMIT_BRANCH == "main"
script:
- pip install mlflow -q
- RUN_ID=$(cat artifacts/model/mlflow_run_id.txt)
- python scripts/register_model.py
--run-id "$RUN_ID"
--model-name "$MODEL_NAME"
--stage Staging
--description "Registered by CI pipeline $CI_PIPELINE_ID, commit $CI_COMMIT_SHORT_SHA"
# Tag the commit with model version
- MODEL_VERSION=$(python scripts/get_model_version.py --model-name "$MODEL_NAME" --stage Staging)
- echo "MODEL_VERSION=$MODEL_VERSION" >> model.env
artifacts:
reports:
dotenv: model.env # Makes MODEL_VERSION available to downstream jobs
expire_in: 1 week

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 5: deploy-staging
# ─────────────────────────────────────────────────────────────────────────────

deploy-to-staging:
stage: deploy-staging
needs:
- job: register-model
artifacts: true
environment:
name: staging
url: https://api-staging.internal.company.com
rules:
- if: $CI_COMMIT_BRANCH == "main"
script:
- pip install -r requirements-deploy.txt -q
# Deploy model container to staging Kubernetes
- python scripts/deploy_model.py
--model-name "$MODEL_NAME"
--model-version "$MODEL_VERSION"
--environment staging
--namespace ml-staging
--k8s-context staging-cluster

staging-smoke-tests:
stage: deploy-staging
needs:
- job: deploy-to-staging
rules:
- if: $CI_COMMIT_BRANCH == "main"
script:
- pip install httpx pytest -q
- pytest tests/smoke/ -v
--base-url https://api-staging.internal.company.com
--tb=short
--timeout=30
artifacts:
reports:
junit: test-results/smoke.xml
expire_in: 1 week

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 6: deploy-production (manual approval required)
# ─────────────────────────────────────────────────────────────────────────────

deploy-to-production:
stage: deploy-production
needs:
- job: staging-smoke-tests
- job: register-model
artifacts: true
environment:
name: production
url: https://api.internal.company.com
# Manual approval - pipeline pauses here until a maintainer clicks "Play"
when: manual
# Protect so only maintainers can approve
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manual
allow_failure: false
script:
- pip install -r requirements-deploy.txt -q
- python scripts/deploy_model.py
--model-name "$MODEL_NAME"
--model-version "$MODEL_VERSION"
--environment production
--namespace ml-production
--k8s-context prod-cluster
# Promote model stage in registry from Staging to Production
- python scripts/promote_model.py
--model-name "$MODEL_NAME"
--model-version "$MODEL_VERSION"
--to-stage Production
# Notify on-call team
- python scripts/notify_slack.py
--message "Model $MODEL_NAME v$MODEL_VERSION deployed to production by $GITLAB_USER_LOGIN"
--channel "#ml-deployments"

DAG Pipelines for Parallel Execution

GitLab CI's needs: keyword enables DAG (Directed Acyclic Graph) execution - jobs start as soon as their dependencies complete, without waiting for the entire previous stage to finish:

# Without DAG: each stage waits for all previous stage jobs to complete
# With DAG (needs:): jobs start as soon as specific dependencies are done

model-evaluation:
stage: evaluate
needs:
- job: model-training # Only waits for model-training
artifacts: true
# Does NOT wait for any other stage 2 jobs

performance-gate:
stage: evaluate
needs:
- job: model-evaluation # Waits only for evaluation, not training
artifacts: true

This parallelism can significantly reduce total pipeline time. In a pipeline with multiple models, each model's evaluation can start immediately when its training finishes, rather than waiting for all models to finish training.

Scheduled Pipelines for Automated Retraining

GitLab has native scheduled pipeline support:

# Configure in GitLab UI: CI/CD > Schedules
# Or via API:

# Scheduled pipeline variables (set in the schedule configuration)
# SCHEDULED_RETRAIN: "true"
# TRAINING_DATA_DATE: "latest" # Or specific date for backfills

# In .gitlab-ci.yml, check for scheduled trigger:
model-training:
stage: train
rules:
# Run on schedule (weekly retrain)
- if: $CI_PIPELINE_SOURCE == "schedule" && $SCHEDULED_RETRAIN == "true"
# Run when training code changes
- changes:
- src/training/**/*
- src/features/**/*
# Manual trigger
- when: manual
allow_failure: false
# scripts/create_schedule.py - create schedule via GitLab API
import requests

GITLAB_URL = "https://gitlab.company.com"
PROJECT_ID = "123"
TOKEN = "your-personal-access-token"

schedule = {
"description": "Weekly fraud model retraining",
"ref": "main",
"cron": "0 2 * * 1", # 2 AM every Monday
"cron_timezone": "UTC",
"active": True,
}

variables = [
{"key": "SCHEDULED_RETRAIN", "value": "true"},
{"key": "TRAINING_DATA_DATE", "value": "latest"},
]

response = requests.post(
f"{GITLAB_URL}/api/v4/projects/{PROJECT_ID}/pipeline_schedules",
headers={"PRIVATE-TOKEN": TOKEN},
json=schedule
)
schedule_id = response.json()["id"]

for var in variables:
requests.post(
f"{GITLAB_URL}/api/v4/projects/{PROJECT_ID}/pipeline_schedules/{schedule_id}/variables",
headers={"PRIVATE-TOKEN": TOKEN},
json=var
)

Artifacts and Passing Data Between Jobs

# Pattern for passing structured data between jobs via artifacts

model-training:
artifacts:
paths:
- artifacts/model/ # Model files
- artifacts/training_metrics.json # Training metrics
expire_in: 2 weeks

model-evaluation:
needs:
- job: model-training
artifacts: true # Automatically downloads model-training artifacts
# Now artifacts/model/ and artifacts/training_metrics.json are available

# Dotenv artifacts: pass variables between jobs
register-model:
script:
- echo "MODEL_VERSION=1.2.3" >> deploy.env
artifacts:
reports:
dotenv: deploy.env

deploy-to-staging:
needs:
- job: register-model
artifacts: true
script:
# $MODEL_VERSION is now available from the dotenv artifact
- echo "Deploying version $MODEL_VERSION"

Production Notes

Protected environments: In GitLab, configure the production environment as protected with "Required approvals" set to 1 or more. This forces a human review before any pipeline can deploy to production. Set Deployment freeze windows for blackout periods (end of quarter, holidays).

GitLab Runner for GPU: Register a GPU runner with specific tags. In the .gitlab-ci.yml, use tags: [gpu, linux] to route GPU jobs. For Kubernetes-based runners, configure the runner to request GPU resources via gpuLimit: 1 in the Helm values.

CI/CD variables vs secrets: GitLab distinguishes between regular variables (visible in job logs) and masked/protected variables. Always mark credentials (MLFLOW_TRACKING_URI with write access, cloud credentials) as both "Masked" (hidden from logs) and "Protected" (only available on protected branches/tags).

:::tip GitLab Environments for Audit Trail GitLab Environments provide a deployment history with who deployed what and when. For regulated industries (finance, healthcare), this audit trail is often required for compliance. Configure environments with approval rules so that every production deployment is traceable to a specific person who approved it. :::

:::warning needs: vs dependencies: Confusion GitLab has two similar keywords: needs: and dependencies:. needs: controls job execution order (DAG) and can optionally download artifacts. dependencies: only controls artifact download and does not affect job order. Use needs: with artifacts: true for both ordering and artifacts. Avoid dependencies: in new pipelines - it is a legacy keyword that causes confusion. :::

:::danger Cache vs Artifacts GitLab cache: is for speeding up jobs (pip cache, build cache) - it is not guaranteed to be present and can be invalidated at any time. GitLab artifacts: are for passing data between jobs - they are guaranteed to be available to jobs that declare needs: [job, artifacts: true]. Never use cache to pass model files between jobs. Use artifacts. :::

Interview Q&A

Q: How does GitLab CI DAG execution differ from standard stage-based execution?

In standard stage-based execution, all jobs in stage N must complete before any job in stage N+1 can start. This means a fast job in stage 2 waits for a slow job in the same stage 2. DAG execution (via needs:) lets jobs start as soon as their specific dependencies complete, regardless of stage. For ML pipelines with multiple models, this means model A's evaluation can start while model B is still training, reducing total pipeline wall time significantly.

Q: How do you implement a human approval gate in GitLab CI?

Set when: manual on the deployment job and configure the GitLab Environment as protected with required approvals. The pipeline pauses at the manual job until a user with the required permissions (typically "Maintainer" role) clicks "Run" in the GitLab UI. For compliance, combine with environment-level approval rules that require approval from specific individuals or groups.

Q: What is the difference between a GitLab CI artifact and a cache?

Artifacts are files produced by a job that are uploaded to GitLab and made available to downstream jobs via needs: [artifacts: true]. They are reliable and guaranteed. Caches are files saved between pipeline runs to speed up execution (pip installs, build outputs) - they may not be present on every run. Model files must use artifacts, not cache. Pip dependencies should use cache.

Q: How do you run different logic for scheduled pipelines vs push-triggered pipelines?

Check the $CI_PIPELINE_SOURCE variable. Scheduled pipelines set it to schedule. You can use rules: with if: $CI_PIPELINE_SOURCE == "schedule" to add or modify behavior for scheduled runs - for example, running a full retraining on schedule but only smoke tests on push.

Q: How do you manage GitLab CI for a multi-model ML platform with 10+ models?

Use GitLab's include: keyword to share common pipeline templates. Create a shared .gitlab/ci-templates/ml-training.yml with the standard validate-train-evaluate-register sequence, parameterized via variables. Each model's project includes this template with its own variables. For a monorepo with multiple models, use path-based rules to only trigger training for the model whose code or data changed.

© 2026 EngineersOfAI. All rights reserved.