What is gitlab ci machine learning?

Build an enterprise-grade ML CI/CD pipeline in GitLab CI - from data commit to production deployment with DAG pipelines, GPU runners, and environments.

How does gitlab cicd ml pipeline work in practice?

GitLab CI for ML covers gitlab ci machine learning, gitlab cicd ml pipeline, gitlab dag pipeline ml from first principles with code examples. Free lesson at https://engineersofai.com/docs/mlops/cicd-for-ml/cd-for-ml-models

What is the difference between gitlab ci machine learning and gitlab dag pipeline ml?

See the full breakdown at https://engineersofai.com/docs/mlops/cicd-for-ml/cd-for-ml-models

GitLab CI for ML

The Enterprise ML Team's Starting Point

DataOps lead Sofia joined an insurance company's ML platform team to find that "CI/CD" meant something very specific: a Jenkins pipeline that ran flake8 on Python files, ran pytest, and declared victory. The actual model training was done by data scientists manually, on their laptops, with results shared via Slack. Deploying a model update meant a data scientist emailing the DevOps team with a model file attached.

The company was processing 40,000 insurance claims per day through three ML models: fraud detection, damage assessment, and settlement amount prediction. All three models were trained manually, infrequently (the fraud model had not been retrained in 11 months), and deployed through a ticket-based process that took 2-3 weeks. Model performance was declining. By the time a model quality problem showed up in business metrics, it had been happening for weeks.

Sofia's mandate: build an automated ML CI/CD pipeline. The company ran entirely on GitLab (on-premise GitLab instance, not GitLab.com - common in regulated industries). The pipeline needed to go from a data commit or code change all the way to a staged production deployment, with automated quality gates and a human approval step before production.

This lesson builds that pipeline. Every configuration block here is production-tested against real GitLab instances.

:::tip 🎮 Interactive Playground Visualize this concept: Try the CI/CD Pipeline for ML demo on the EngineersOfAI Playground - no code required. :::

Why This Exists

GitLab CI was built with enterprise features that GitHub Actions has only recently started to match: environments (with manual approval gates), DAG pipeline execution (jobs run as soon as dependencies are met, not in sequential stages), and first-class support for scheduled pipelines. For on-premise deployments and regulated industries, GitLab also offers more control over the runner infrastructure.

GitLab's approach to ML CI/CD is through its Auto DevOps philosophy extended with ML-specific patterns. The GitLab model registry (added in 2023) integrates directly with the CI pipeline. But even without it, the combination of .gitlab-ci.yml DAG pipelines, artifacts, environments, and scheduled triggers provides everything needed for a complete ML CI/CD system.

GitLab CI Architecture for ML

The Complete `.gitlab-ci.yml`

# .gitlab-ci.yml
# ML CI/CD pipeline: validate → train → evaluate → register → deploy

image: python:3.11-slim

# ─────────────────────────────────────────────────────────────────────────────
# Global variables - override with CI/CD variables in GitLab UI
# ─────────────────────────────────────────────────────────────────────────────
variables:
  MLFLOW_TRACKING_URI: $MLFLOW_TRACKING_URI_SECRET
  MODEL_NAME: fraud-detector
  PYTHON_VERSION: "3.11"
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
  # Training data location (S3 or internal object store)
  TRAINING_DATA_PATH: s3://ml-data-prod/fraud/latest/train.parquet
  EVAL_DATA_PATH: s3://ml-data-prod/fraud/eval/eval_set_v3.parquet

# Cache pip dependencies across jobs
cache:
  key:
    files:
      - requirements.txt
    prefix: pip-v1
  paths:
    - .cache/pip/

stages:
  - validate
  - train
  - evaluate
  - register
  - deploy-staging
  - deploy-production

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 1: validate
# ─────────────────────────────────────────────────────────────────────────────

data-validation:
  stage: validate
  script:
    - pip install -r requirements.txt -q
    - python scripts/validate_data.py --data-path "$TRAINING_DATA_PATH"
  # Only run if training data or pipeline code changed
  rules:
    - changes:
        - data/**/*
        - src/pipeline/**/*
        - src/features/**/*
        - scripts/validate_data.py
  artifacts:
    paths:
      - validation_report.json
    expire_in: 1 week
    when: always  # Upload even on failure for debugging

code-lint:
  stage: validate
  script:
    - pip install ruff mypy -q
    - ruff check src/ tests/
    - mypy src/ --ignore-missing-imports
  rules:
    - when: always  # Always lint

unit-tests:
  stage: validate
  script:
    - pip install -r requirements-dev.txt -q
    - pytest tests/unit/ -v --tb=short --timeout=60 --junitxml=test-results/unit.xml
  artifacts:
    reports:
      junit: test-results/unit.xml
    paths:
      - test-results/
    expire_in: 1 week
  rules:
    - when: always

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 2: train (GPU runner)
# ─────────────────────────────────────────────────────────────────────────────

model-training:
  stage: train
  # GPU runner - registered with tag 'gpu'
  tags:
    - gpu
    - linux
  image: nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
  timeout: 3 hours
  variables:
    MLFLOW_EXPERIMENT_NAME: "fraud-detector-ci"
    MLFLOW_RUN_NAME: "ci-$CI_COMMIT_SHORT_SHA-$CI_PIPELINE_ID"
  before_script:
    # GPU runner uses a different base image - install Python and deps
    - apt-get update -q && apt-get install -y -q python3.11 python3-pip
    - pip3 install -r requirements.txt -q
  script:
    - python3 src/training/train.py
        --data-path "$TRAINING_DATA_PATH"
        --output-dir artifacts/model
        --config config/training.yaml
        --run-name "$MLFLOW_RUN_NAME"
    # Verify model artifact was created
    - test -f artifacts/model/model.joblib || (echo "ERROR: model file not created" && exit 1)
    # Save MLflow run ID for downstream stages
    - cat artifacts/model/mlflow_run_id.txt
  artifacts:
    paths:
      - artifacts/model/
    expire_in: 2 weeks
    name: "model-$CI_COMMIT_SHORT_SHA"
  # Training only runs when training code or data changes
  rules:
    - changes:
        - src/training/**/*
        - src/features/**/*
        - src/models/**/*
        - config/training*.yaml
        - data/**/*
    # Always run on main branch merges regardless of path
    - if: $CI_COMMIT_BRANCH == "main"
    # Allow manual trigger
    - when: manual
      allow_failure: false

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 3: evaluate
# ─────────────────────────────────────────────────────────────────────────────

model-evaluation:
  stage: evaluate
  needs:
    - job: model-training
      artifacts: true
  script:
    - pip install -r requirements.txt -q
    - python src/evaluation/evaluate.py
        --model-path artifacts/model/model.joblib
        --eval-data-path "$EVAL_DATA_PATH"
        --output-path artifacts/evaluation_results.json
    - cat artifacts/evaluation_results.json
  artifacts:
    paths:
      - artifacts/evaluation_results.json
    reports:
      # GitLab can parse metrics for trend visualization
      metrics: artifacts/evaluation_results.json
    expire_in: 2 weeks

performance-gate:
  stage: evaluate
  needs:
    - job: model-evaluation
      artifacts: true
    - job: model-training
      artifacts: true
  script:
    - pip install -r requirements.txt -q
    # Fetch baseline metrics from model registry (current production model)
    - python scripts/fetch_baseline_metrics.py
        --model-name "$MODEL_NAME"
        --stage Production
        --output-path artifacts/baseline_metrics.json
    # Run gate check - exits 1 if gate fails
    - python scripts/check_gate.py
        --new-metrics artifacts/evaluation_results.json
        --baseline-metrics artifacts/baseline_metrics.json
        --min-auc 0.90
        --max-regression 0.01
        --subgroup-max-regression 0.03
  artifacts:
    paths:
      - artifacts/baseline_metrics.json
    expire_in: 1 week
    when: always

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 4: register
# ─────────────────────────────────────────────────────────────────────────────

register-model:
  stage: register
  needs:
    - job: performance-gate
    - job: model-training
      artifacts: true
    - job: model-evaluation
      artifacts: true
  # Only register on main branch (not on MR pipelines)
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  script:
    - pip install mlflow -q
    - RUN_ID=$(cat artifacts/model/mlflow_run_id.txt)
    - python scripts/register_model.py
        --run-id "$RUN_ID"
        --model-name "$MODEL_NAME"
        --stage Staging
        --description "Registered by CI pipeline $CI_PIPELINE_ID, commit $CI_COMMIT_SHORT_SHA"
    # Tag the commit with model version
    - MODEL_VERSION=$(python scripts/get_model_version.py --model-name "$MODEL_NAME" --stage Staging)
    - echo "MODEL_VERSION=$MODEL_VERSION" >> model.env
  artifacts:
    reports:
      dotenv: model.env  # Makes MODEL_VERSION available to downstream jobs
    expire_in: 1 week

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 5: deploy-staging
# ─────────────────────────────────────────────────────────────────────────────

deploy-to-staging:
  stage: deploy-staging
  needs:
    - job: register-model
      artifacts: true
  environment:
    name: staging
    url: https://api-staging.internal.company.com
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  script:
    - pip install -r requirements-deploy.txt -q
    # Deploy model container to staging Kubernetes
    - python scripts/deploy_model.py
        --model-name "$MODEL_NAME"
        --model-version "$MODEL_VERSION"
        --environment staging
        --namespace ml-staging
        --k8s-context staging-cluster

staging-smoke-tests:
  stage: deploy-staging
  needs:
    - job: deploy-to-staging
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  script:
    - pip install httpx pytest -q
    - pytest tests/smoke/ -v
        --base-url https://api-staging.internal.company.com
        --tb=short
        --timeout=30
  artifacts:
    reports:
      junit: test-results/smoke.xml
    expire_in: 1 week

# ─────────────────────────────────────────────────────────────────────────────
# STAGE 6: deploy-production (manual approval required)
# ─────────────────────────────────────────────────────────────────────────────

deploy-to-production:
  stage: deploy-production
  needs:
    - job: staging-smoke-tests
    - job: register-model
      artifacts: true
  environment:
    name: production
    url: https://api.internal.company.com
  # Manual approval - pipeline pauses here until a maintainer clicks "Play"
  when: manual
  # Protect so only maintainers can approve
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
      allow_failure: false
  script:
    - pip install -r requirements-deploy.txt -q
    - python scripts/deploy_model.py
        --model-name "$MODEL_NAME"
        --model-version "$MODEL_VERSION"
        --environment production
        --namespace ml-production
        --k8s-context prod-cluster
    # Promote model stage in registry from Staging to Production
    - python scripts/promote_model.py
        --model-name "$MODEL_NAME"
        --model-version "$MODEL_VERSION"
        --to-stage Production
    # Notify on-call team
    - python scripts/notify_slack.py
        --message "Model $MODEL_NAME v$MODEL_VERSION deployed to production by $GITLAB_USER_LOGIN"
        --channel "#ml-deployments"

DAG Pipelines for Parallel Execution

GitLab CI's needs: keyword enables DAG (Directed Acyclic Graph) execution - jobs start as soon as their dependencies complete, without waiting for the entire previous stage to finish:

# Without DAG: each stage waits for all previous stage jobs to complete
# With DAG (needs:): jobs start as soon as specific dependencies are done

model-evaluation:
  stage: evaluate
  needs:
    - job: model-training      # Only waits for model-training
      artifacts: true
  # Does NOT wait for any other stage 2 jobs

performance-gate:
  stage: evaluate
  needs:
    - job: model-evaluation    # Waits only for evaluation, not training
      artifacts: true

This parallelism can significantly reduce total pipeline time. In a pipeline with multiple models, each model's evaluation can start immediately when its training finishes, rather than waiting for all models to finish training.

Scheduled Pipelines for Automated Retraining

GitLab has native scheduled pipeline support:

# Configure in GitLab UI: CI/CD > Schedules
# Or via API:

# Scheduled pipeline variables (set in the schedule configuration)
# SCHEDULED_RETRAIN: "true"
# TRAINING_DATA_DATE: "latest"  # Or specific date for backfills

# In .gitlab-ci.yml, check for scheduled trigger:
model-training:
  stage: train
  rules:
    # Run on schedule (weekly retrain)
    - if: $CI_PIPELINE_SOURCE == "schedule" && $SCHEDULED_RETRAIN == "true"
    # Run when training code changes
    - changes:
        - src/training/**/*
        - src/features/**/*
    # Manual trigger
    - when: manual
      allow_failure: false

# scripts/create_schedule.py - create schedule via GitLab API
import requests

GITLAB_URL = "https://gitlab.company.com"
PROJECT_ID = "123"
TOKEN = "your-personal-access-token"

schedule = {
    "description": "Weekly fraud model retraining",
    "ref": "main",
    "cron": "0 2 * * 1",  # 2 AM every Monday
    "cron_timezone": "UTC",
    "active": True,
}

variables = [
    {"key": "SCHEDULED_RETRAIN", "value": "true"},
    {"key": "TRAINING_DATA_DATE", "value": "latest"},
]

response = requests.post(
    f"{GITLAB_URL}/api/v4/projects/{PROJECT_ID}/pipeline_schedules",
    headers={"PRIVATE-TOKEN": TOKEN},
    json=schedule
)
schedule_id = response.json()["id"]

for var in variables:
    requests.post(
        f"{GITLAB_URL}/api/v4/projects/{PROJECT_ID}/pipeline_schedules/{schedule_id}/variables",
        headers={"PRIVATE-TOKEN": TOKEN},
        json=var
    )

Artifacts and Passing Data Between Jobs

# Pattern for passing structured data between jobs via artifacts

model-training:
  artifacts:
    paths:
      - artifacts/model/          # Model files
      - artifacts/training_metrics.json  # Training metrics
    expire_in: 2 weeks

model-evaluation:
  needs:
    - job: model-training
      artifacts: true  # Automatically downloads model-training artifacts
  # Now artifacts/model/ and artifacts/training_metrics.json are available

# Dotenv artifacts: pass variables between jobs
register-model:
  script:
    - echo "MODEL_VERSION=1.2.3" >> deploy.env
  artifacts:
    reports:
      dotenv: deploy.env

deploy-to-staging:
  needs:
    - job: register-model
      artifacts: true
  script:
    # $MODEL_VERSION is now available from the dotenv artifact
    - echo "Deploying version $MODEL_VERSION"

Production Notes

Protected environments: In GitLab, configure the production environment as protected with "Required approvals" set to 1 or more. This forces a human review before any pipeline can deploy to production. Set Deployment freeze windows for blackout periods (end of quarter, holidays).

GitLab Runner for GPU: Register a GPU runner with specific tags. In the .gitlab-ci.yml, use tags: [gpu, linux] to route GPU jobs. For Kubernetes-based runners, configure the runner to request GPU resources via gpuLimit: 1 in the Helm values.

CI/CD variables vs secrets: GitLab distinguishes between regular variables (visible in job logs) and masked/protected variables. Always mark credentials (MLFLOW_TRACKING_URI with write access, cloud credentials) as both "Masked" (hidden from logs) and "Protected" (only available on protected branches/tags).

:::tip GitLab Environments for Audit Trail GitLab Environments provide a deployment history with who deployed what and when. For regulated industries (finance, healthcare), this audit trail is often required for compliance. Configure environments with approval rules so that every production deployment is traceable to a specific person who approved it. :::

:::warning needs: vs dependencies: Confusion GitLab has two similar keywords: needs: and dependencies:. needs: controls job execution order (DAG) and can optionally download artifacts. dependencies: only controls artifact download and does not affect job order. Use needs: with artifacts: true for both ordering and artifacts. Avoid dependencies: in new pipelines - it is a legacy keyword that causes confusion. :::

:::danger Cache vs Artifacts GitLab cache: is for speeding up jobs (pip cache, build cache) - it is not guaranteed to be present and can be invalidated at any time. GitLab artifacts: are for passing data between jobs - they are guaranteed to be available to jobs that declare needs: [job, artifacts: true]. Never use cache to pass model files between jobs. Use artifacts. :::

Interview Q&A

Q: How does GitLab CI DAG execution differ from standard stage-based execution?

In standard stage-based execution, all jobs in stage N must complete before any job in stage N+1 can start. This means a fast job in stage 2 waits for a slow job in the same stage 2. DAG execution (via needs:) lets jobs start as soon as their specific dependencies complete, regardless of stage. For ML pipelines with multiple models, this means model A's evaluation can start while model B is still training, reducing total pipeline wall time significantly.

Q: How do you implement a human approval gate in GitLab CI?

Set when: manual on the deployment job and configure the GitLab Environment as protected with required approvals. The pipeline pauses at the manual job until a user with the required permissions (typically "Maintainer" role) clicks "Run" in the GitLab UI. For compliance, combine with environment-level approval rules that require approval from specific individuals or groups.

Q: What is the difference between a GitLab CI artifact and a cache?

Artifacts are files produced by a job that are uploaded to GitLab and made available to downstream jobs via needs: [artifacts: true]. They are reliable and guaranteed. Caches are files saved between pipeline runs to speed up execution (pip installs, build outputs) - they may not be present on every run. Model files must use artifacts, not cache. Pip dependencies should use cache.

Q: How do you run different logic for scheduled pipelines vs push-triggered pipelines?

Check the $CI_PIPELINE_SOURCE variable. Scheduled pipelines set it to schedule. You can use rules: with if: $CI_PIPELINE_SOURCE == "schedule" to add or modify behavior for scheduled runs - for example, running a full retraining on schedule but only smoke tests on push.

Q: How do you manage GitLab CI for a multi-model ML platform with 10+ models?

Use GitLab's include: keyword to share common pipeline templates. Create a shared .gitlab/ci-templates/ml-training.yml with the standard validate-train-evaluate-register sequence, parameterized via variables. Each model's project includes this template with its own variables. For a monorepo with multiple models, use path-based rules to only trigger training for the model whose code or data changed.

The Enterprise ML Team's Starting Point​

Why This Exists​

GitLab CI Architecture for ML​

The Complete .gitlab-ci.yml​

DAG Pipelines for Parallel Execution​

Scheduled Pipelines for Automated Retraining​

Artifacts and Passing Data Between Jobs​

Production Notes​

Interview Q&A​