What is github actions machine learning?

Build a complete ML CI pipeline in GitHub Actions that triggers training only when training data or model code changes - not on every commit.

How does ml ci pipeline work in practice?

GitHub Actions for ML covers github actions machine learning, ml ci pipeline, github actions workflow ml from first principles with code examples. Free lesson at https://engineersofai.com/docs/mlops/cicd-for-ml/model-validation-gates

What is the difference between github actions machine learning and github actions workflow ml?

See the full breakdown at https://engineersofai.com/docs/mlops/cicd-for-ml/model-validation-gates

GitHub Actions for ML

The Runaway CI Bill

Priya was eight weeks into her MLOps role when she got a Slack message from the engineering manager: "Can you explain why our GitHub Actions bill jumped from $400 to$ 2,100 last month?" She pulled up the Actions tab. Every pull request - including PRs that only changed documentation, README files, or test fixtures - was triggering the full ML CI pipeline: linting, unit tests, dependency install, AND a full model training run that used a GPU-enabled self-hosted runner.

The GPU runner cost $1.60 per hour. An average training run took 45 minutes. The team had merged 47 PRs that month.$ 1.60 × 0.75 hours × 47 = $56.40 just for GPU time - but most of those training runs were completely unnecessary because the PR changed a comment in a notebook, not anything that affected training.

The fix was path-based triggers. Training should only run when training code or training data changes. Documentation, tests, and configuration changes should run only the cheap stages. After the fix, GPU-triggered runs dropped from 47 to 9 for the month - the 9 PRs that actually touched training code. The bill dropped back to $380.

This lesson builds the complete ML CI pipeline in GitHub Actions that Priya's team should have had from the start: smart triggers, GPU runner configuration, artifact management, secrets handling, and reusable workflow patterns.

:::tip 🎮 Interactive Playground Visualize this concept: Try the Model Validation Gates demo on the EngineersOfAI Playground - no code required. :::

Why This Exists

GitHub Actions launched in 2019 and quickly became the default CI platform for open-source and startup engineering teams. Its appeal: YAML-based workflow definitions checked into the repo, tight GitHub integration (PR status checks, artifact uploads, environment deployments), and a generous free tier for public repos.

For ML teams, GitHub Actions offers a critical feature: path filters on trigger conditions. You can declare "only run this job if files matching training/** or data/** have changed." Combined with job dependencies and artifacts, this lets you build ML CI pipelines where expensive steps (training on GPU) only run when they are warranted.

The ecosystem has grown around ML use cases: the actions/cache action handles pip/conda caching, marketplace actions exist for MLflow, DVC, and Weights & Biases integration, and GitHub now supports self-hosted GPU runners directly.

GitHub Actions Workflow Architecture for ML

The Complete ML CI Workflow

# .github/workflows/ml-ci.yml
name: ML CI Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

# Cancel in-progress runs on the same branch when a new commit is pushed
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

env:
  PYTHON_VERSION: "3.11"
  MODEL_REGISTRY_URI: ${{ secrets.MLFLOW_TRACKING_URI }}

jobs:
  # ─────────────────────────────────────────────────────────────────────────
  # JOB 1: Code CI - runs on every PR, cheap, fast
  # ─────────────────────────────────────────────────────────────────────────
  code-ci:
    name: Code CI
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}

      # Cache pip dependencies - saves 2-4 minutes per run
      - name: Cache pip
        uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: pip-${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements*.txt') }}
          restore-keys: |
            pip-${{ env.PYTHON_VERSION }}-

      - name: Install dependencies
        run: pip install -r requirements-dev.txt

      - name: Lint (ruff)
        run: ruff check src/ tests/

      - name: Type check (mypy)
        run: mypy src/ --ignore-missing-imports

      - name: Unit tests
        run: |
          pytest tests/unit/ \
            -v \
            --tb=short \
            --timeout=60 \
            --cov=src \
            --cov-report=xml \
            --cov-fail-under=70

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          file: coverage.xml
          fail_ci_if_error: false  # Don't fail CI for coverage reporting issues

      - name: Data validation tests
        run: pytest tests/data/ -v --tb=short --timeout=120

  # ─────────────────────────────────────────────────────────────────────────
  # JOB 2: Detect what changed - determines if training should run
  # ─────────────────────────────────────────────────────────────────────────
  detect-changes:
    name: Detect Changes
    runs-on: ubuntu-latest
    outputs:
      training-changed: ${{ steps.filter.outputs.training }}
      data-changed: ${{ steps.filter.outputs.data }}
      should-train: ${{ steps.decide.outputs.should-train }}

    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history needed for accurate diff

      - name: Path filter
        id: filter
        uses: dorny/paths-filter@v3
        with:
          filters: |
            training:
              - 'src/training/**'
              - 'src/features/**'
              - 'src/models/**'
              - 'config/training*.yaml'
              - 'requirements.txt'
            data:
              - 'data/training/**'
              - 'scripts/prepare_data.py'

      - name: Decide whether to train
        id: decide
        run: |
          if [[ "${{ steps.filter.outputs.training }}" == "true" || \
                "${{ steps.filter.outputs.data }}" == "true" ]]; then
            echo "should-train=true" >> $GITHUB_OUTPUT
            echo "Training will run: training=${{ steps.filter.outputs.training }} data=${{ steps.filter.outputs.data }}"
          else
            echo "should-train=false" >> $GITHUB_OUTPUT
            echo "Skipping training: no training code or data changes detected"
          fi

  # ─────────────────────────────────────────────────────────────────────────
  # JOB 3: Training - conditional on detect-changes, runs on GPU runner
  # ─────────────────────────────────────────────────────────────────────────
  training:
    name: Train and Evaluate
    needs: [code-ci, detect-changes]
    if: needs.detect-changes.outputs.should-train == 'true'
    # Self-hosted GPU runner - see GPU runner configuration section below
    runs-on: [self-hosted, gpu, linux]
    timeout-minutes: 180  # 3 hour hard limit

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}

      # GPU runner has CUDA pre-installed, but pip cache still speeds things up
      - name: Cache pip (GPU runner)
        uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: gpu-pip-${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements*.txt') }}
          restore-keys: |
            gpu-pip-${{ env.PYTHON_VERSION }}-

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Configure AWS credentials (for S3 data access)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_TRAINING_ROLE_ARN }}
          aws-region: us-east-1

      - name: Download training data from S3
        run: |
          aws s3 sync \
            s3://${{ secrets.TRAINING_DATA_BUCKET }}/fraud/latest/ \
            data/training/ \
            --exclude "*.tmp"

      - name: Validate training data
        run: python scripts/validate_data.py data/training/train.parquet

      - name: Train model
        id: train
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
          MLFLOW_EXPERIMENT_NAME: fraud-detection-ci
          TRAINING_RUN_NAME: "ci-${{ github.sha }}-${{ github.run_number }}"
        run: |
          python src/training/train.py \
            --data-path data/training/train.parquet \
            --output-dir artifacts/model \
            --config config/training.yaml \
            --run-name "$TRAINING_RUN_NAME"

          # Extract run ID from training output for downstream steps
          echo "run-id=$(cat artifacts/model/mlflow_run_id.txt)" >> $GITHUB_OUTPUT

      - name: Evaluate model
        run: |
          python src/evaluation/evaluate.py \
            --model-path artifacts/model/model.joblib \
            --eval-data-path data/eval/eval_set_v3.parquet \
            --output-path artifacts/evaluation_results.json

      - name: Download baseline metrics
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
        run: |
          python scripts/fetch_baseline_metrics.py \
            --model-name fraud-detector \
            --stage Production \
            --output-path artifacts/baseline_metrics.json

      - name: Check performance gate
        run: |
          python scripts/check_gate.py \
            --new-metrics artifacts/evaluation_results.json \
            --baseline-metrics artifacts/baseline_metrics.json \
            --min-auc 0.90 \
            --max-regression 0.01

      # Upload model and metrics as GitHub Actions artifacts
      - name: Upload model artifact
        if: success()
        uses: actions/upload-artifact@v4
        with:
          name: trained-model-${{ github.sha }}
          path: |
            artifacts/model/
            artifacts/evaluation_results.json
          retention-days: 30

      - name: Register model in MLflow (on main branch only)
        if: github.ref == 'refs/heads/main' && success()
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
        run: |
          python scripts/register_model.py \
            --run-id ${{ steps.train.outputs.run-id }} \
            --model-name fraud-detector \
            --stage Staging

      - name: Post evaluation summary to PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const results = JSON.parse(fs.readFileSync('artifacts/evaluation_results.json'));
            const body = [
              '## Model Evaluation Results',
              '',
              `| Metric | Value |`,
              `|--------|-------|`,
              `| ROC-AUC | ${results.roc_auc.toFixed(4)} |`,
              `| Average Precision | ${results.average_precision.toFixed(4)} |`,
              `| F1 @ 0.5 | ${results.f1_at_0_5.toFixed(4)} |`,
              `| Eval Samples | ${results.n_eval_samples.toLocaleString()} |`,
            ].join('\n');

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: body
            });

  # ─────────────────────────────────────────────────────────────────────────
  # JOB 4: Integration tests - run after training if model is available
  # ─────────────────────────────────────────────────────────────────────────
  integration-tests:
    name: Integration Tests
    needs: [code-ci]
    runs-on: ubuntu-latest
    timeout-minutes: 30

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ env.PYTHON_VERSION }}

      - name: Cache pip
        uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: pip-${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements*.txt') }}

      - name: Install dependencies
        run: pip install -r requirements-dev.txt

      - name: Run integration tests
        run: |
          pytest tests/integration/ \
            -v \
            --tb=short \
            --timeout=300 \
            -x  # Stop on first failure

Matrix Builds for Multiple Python / Framework Versions

When supporting multiple environments, use matrix builds:

# .github/workflows/compatibility.yml
jobs:
  test-matrix:
    name: Test Python ${{ matrix.python-version }} / sklearn ${{ matrix.sklearn-version }}
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.10", "3.11", "3.12"]
        sklearn-version: ["1.3", "1.4", "1.5"]
        exclude:
          # sklearn 1.5 dropped Python 3.10 support
          - python-version: "3.10"
            sklearn-version: "1.5"
      fail-fast: false  # Complete all matrix jobs even if one fails

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}

      - name: Install specific sklearn version
        run: |
          pip install scikit-learn==${{ matrix.sklearn-version }}.*
          pip install -r requirements-dev.txt --no-deps scikit-learn

      - name: Run unit tests
        run: pytest tests/unit/ -v --tb=short

Self-Hosted GPU Runner Setup

GitHub-hosted runners do not have GPUs. For GPU training, use self-hosted runners:

# Runner runs on a machine with NVIDIA GPU
# Register it with: ./config.sh --url https://github.com/ORG/REPO --token TOKEN --labels gpu,linux
runs-on: [self-hosted, gpu, linux]

The runner machine needs:

NVIDIA drivers installed
CUDA toolkit matching your training framework version
The GitHub Actions runner agent: actions/runner
Docker with NVIDIA Container Runtime (for isolated training)

# Setup script for self-hosted GPU runner (Ubuntu 22.04)
# Run once on the GPU machine

# Install NVIDIA drivers + CUDA
sudo apt-get install -y nvidia-driver-525 cuda-toolkit-12-1

# Install Docker
curl -fsSL https://get.docker.com | sh

# Install NVIDIA Container Runtime
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

# Verify GPU access
nvidia-smi

# Register the runner (get token from GitHub repo Settings > Actions > Runners)
mkdir ~/actions-runner && cd ~/actions-runner
curl -o actions-runner-linux-x64-2.317.0.tar.gz -L \
  https://github.com/actions/runner/releases/download/v2.317.0/actions-runner-linux-x64-2.317.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.317.0.tar.gz
./config.sh --url https://github.com/YOUR_ORG/YOUR_REPO \
            --token YOUR_REGISTRATION_TOKEN \
            --labels gpu,linux,cuda12
sudo ./svc.sh install && sudo ./svc.sh start

Secrets Management

Never put credentials in the workflow YAML. Use GitHub Secrets:

# In workflow YAML: reference secrets
env:
  MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
  SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}

# AWS: prefer OIDC over access keys (no static credentials)
- uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789:role/github-actions-ml-training
    aws-region: us-east-1
    # OIDC: no AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY needed

# scripts/setup_oidc.py - one-time setup: create AWS role for GitHub OIDC
import boto3
import json

# Trust policy: allows GitHub Actions to assume this role
trust_policy = {
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Principal": {"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"},
        "Action": "sts:AssumeRoleWithWebIdentity",
        "Condition": {
            "StringEquals": {
                "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
            },
            "StringLike": {
                # Limit to your specific repo
                "token.actions.githubusercontent.com:sub": "repo:YOUR_ORG/YOUR_REPO:*"
            }
        }
    }]
}

Reusable Workflows

Extract common patterns into reusable workflows:

# .github/workflows/reusable-training.yml
name: Reusable Training Workflow
on:
  workflow_call:
    inputs:
      model-name:
        required: true
        type: string
      data-path:
        required: true
        type: string
      config-file:
        required: false
        type: string
        default: "config/training.yaml"
    secrets:
      MLFLOW_TRACKING_URI:
        required: true
      AWS_TRAINING_ROLE_ARN:
        required: true

jobs:
  train:
    runs-on: [self-hosted, gpu, linux]
    steps:
      - uses: actions/checkout@v4
      - name: Train ${{ inputs.model-name }}
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
        run: |
          python src/training/train.py \
            --model-name ${{ inputs.model-name }} \
            --data-path ${{ inputs.data-path }} \
            --config ${{ inputs.config-file }}

# .github/workflows/fraud-model.yml - calls the reusable workflow
name: Fraud Model CI
on:
  push:
    paths: ['src/training/**', 'data/fraud/**']

jobs:
  train:
    uses: ./.github/workflows/reusable-training.yml
    with:
      model-name: fraud-detector
      data-path: data/fraud/latest/train.parquet
    secrets:
      MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
      AWS_TRAINING_ROLE_ARN: ${{ secrets.AWS_TRAINING_ROLE_ARN }}

Production Notes

Artifact retention: GitHub Actions artifacts are deleted after the retention period (default 90 days, configurable down to 1 day). For model artifacts, use GitHub Actions to upload to S3 or your model registry - do not rely on GitHub artifacts as your model store.

Concurrency groups: Set concurrency at the workflow level with cancel-in-progress: true. On a busy PR branch, without this, multiple training runs can queue up, waste GPU time, and create race conditions in model registration.

Timeout values: Always set timeout-minutes on training jobs. A hung training job (e.g., waiting for GPU memory that never freed) will otherwise run until GitHub's 6-hour job limit, burning compute and blocking the runner.

Caching strategy: The actions/cache key should include the hash of your dependency file (hashFiles('requirements.txt')). When dependencies change, the cache is invalidated and reinstalled. Use restore-keys for partial cache hits - useful when only some packages changed.

:::tip Use workflow_dispatch for Manual Training Triggers Add workflow_dispatch to your trigger list to allow manual workflow runs from the GitHub UI. This is invaluable for running a full retraining manually when you know data has changed outside the normal PR flow.

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  workflow_dispatch:  # Enables manual trigger from GitHub UI
    inputs:
      force-retrain:
        description: 'Force retraining even if no training changes detected'
        type: boolean
        default: false

:::

:::warning Path Filter Gotcha The dorny/paths-filter action uses the diff between the PR branch and the base branch - which is correct. But the on.push.paths filter uses the diff between the commit and its parent - which means a force-push or rebase can change what files appear changed. For production pipelines, use the paths-filter action in a separate job rather than on.push.paths for reliable change detection. :::

:::danger Storing Credentials as Repository Secrets vs Environment Secrets Repository secrets are accessible to all workflows in the repo. If a contributor creates a malicious PR that modifies a workflow file, they can exfiltrate secrets. For production credentials (MLFLOW_TRACKING_URI with write access, AWS roles), use GitHub Environments with required reviewers, or scope them to protected branches only. :::

Interview Q&A

Q: How do you prevent training from running on every PR regardless of what changed?

Use path-based triggers with the dorny/paths-filter action in a detect-changes job. The job outputs a boolean (should-train) based on whether training code (src/training/**) or training data paths changed. Downstream training jobs check if: needs.detect-changes.outputs.should-train == 'true' and skip entirely if the flag is false. This can reduce GPU runner usage by 60-80% on active repos.

Q: How do you set up GPU runners in GitHub Actions?

GitHub-hosted runners do not provide GPUs. You register a self-hosted runner on a GPU machine (cloud VM or physical) using the GitHub Actions runner agent, labeling it gpu. Workflows target it with runs-on: [self-hosted, gpu, linux]. The machine needs NVIDIA drivers, CUDA toolkit, and optionally NVIDIA Container Runtime for isolated training. Auto-scaling can be achieved with tools like actions-runner-controller on Kubernetes with GPU node pools.

Q: How do you manage secrets securely in GitHub Actions ML pipelines?

Prefer OIDC (OpenID Connect) over static credentials wherever possible. For AWS, create an IAM role with a trust policy that allows GitHub Actions to assume it, then use aws-actions/configure-aws-credentials with role-to-assume - no access keys required. For other secrets (MLflow tracking URI, Slack webhooks), use GitHub Encrypted Secrets and reference them as ${{ secrets.SECRET_NAME }}. Never echo secrets or put them in artifacts.

Q: How do you make evaluation results visible in pull requests?

Use the actions/github-script action to post a comment to the PR via the GitHub API. Read the evaluation JSON from the artifacts, format it as a markdown table, and call github.rest.issues.createComment. Add this step with if: github.event_name == 'pull_request' so it only runs on PRs, not on direct pushes to main.

Q: What is a reusable workflow and when should you use one in ML CI?

A reusable workflow (workflow_call trigger) is a GitHub Actions workflow that can be called from other workflows like a function. In ML, they are useful when you have multiple models (fraud, churn, ranking) that all go through the same training + evaluation + registration sequence. Instead of duplicating 200 lines of YAML per model, you write the logic once in a reusable workflow and each model's workflow calls it with different parameters. This keeps changes (security patches, new evaluation steps) centralized.

The Runaway CI Bill​

Why This Exists​

GitHub Actions Workflow Architecture for ML​

The Complete ML CI Workflow​

Matrix Builds for Multiple Python / Framework Versions​

Self-Hosted GPU Runner Setup​

Secrets Management​

Reusable Workflows​

Production Notes​

Interview Q&A​