GitHub Actions for ML
The Runaway CI Bill
Priya was eight weeks into her MLOps role when she got a Slack message from the engineering manager: "Can you explain why our GitHub Actions bill jumped from 2,100 last month?" She pulled up the Actions tab. Every pull request - including PRs that only changed documentation, README files, or test fixtures - was triggering the full ML CI pipeline: linting, unit tests, dependency install, AND a full model training run that used a GPU-enabled self-hosted runner.
The GPU runner cost 1.60 × 0.75 hours × 47 = $56.40 just for GPU time - but most of those training runs were completely unnecessary because the PR changed a comment in a notebook, not anything that affected training.
The fix was path-based triggers. Training should only run when training code or training data changes. Documentation, tests, and configuration changes should run only the cheap stages. After the fix, GPU-triggered runs dropped from 47 to 9 for the month - the 9 PRs that actually touched training code. The bill dropped back to $380.
This lesson builds the complete ML CI pipeline in GitHub Actions that Priya's team should have had from the start: smart triggers, GPU runner configuration, artifact management, secrets handling, and reusable workflow patterns.
:::tip 🎮 Interactive Playground Visualize this concept: Try the Model Validation Gates demo on the EngineersOfAI Playground - no code required. :::
Why This Exists
GitHub Actions launched in 2019 and quickly became the default CI platform for open-source and startup engineering teams. Its appeal: YAML-based workflow definitions checked into the repo, tight GitHub integration (PR status checks, artifact uploads, environment deployments), and a generous free tier for public repos.
For ML teams, GitHub Actions offers a critical feature: path filters on trigger conditions. You can
declare "only run this job if files matching training/** or data/** have changed." Combined
with job dependencies and artifacts, this lets you build ML CI pipelines where expensive steps
(training on GPU) only run when they are warranted.
The ecosystem has grown around ML use cases: the actions/cache action handles pip/conda
caching, marketplace actions exist for MLflow, DVC, and Weights & Biases integration, and
GitHub now supports self-hosted GPU runners directly.
GitHub Actions Workflow Architecture for ML
The Complete ML CI Workflow
# .github/workflows/ml-ci.yml
name: ML CI Pipeline
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
# Cancel in-progress runs on the same branch when a new commit is pushed
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
PYTHON_VERSION: "3.11"
MODEL_REGISTRY_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
jobs:
# ─────────────────────────────────────────────────────────────────────────
# JOB 1: Code CI - runs on every PR, cheap, fast
# ─────────────────────────────────────────────────────────────────────────
code-ci:
name: Code CI
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
# Cache pip dependencies - saves 2-4 minutes per run
- name: Cache pip
uses: actions/cache@v4
with:
path: ~/.cache/pip
key: pip-${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements*.txt') }}
restore-keys: |
pip-${{ env.PYTHON_VERSION }}-
- name: Install dependencies
run: pip install -r requirements-dev.txt
- name: Lint (ruff)
run: ruff check src/ tests/
- name: Type check (mypy)
run: mypy src/ --ignore-missing-imports
- name: Unit tests
run: |
pytest tests/unit/ \
-v \
--tb=short \
--timeout=60 \
--cov=src \
--cov-report=xml \
--cov-fail-under=70
- name: Upload coverage
uses: codecov/codecov-action@v4
with:
file: coverage.xml
fail_ci_if_error: false # Don't fail CI for coverage reporting issues
- name: Data validation tests
run: pytest tests/data/ -v --tb=short --timeout=120
# ─────────────────────────────────────────────────────────────────────────
# JOB 2: Detect what changed - determines if training should run
# ─────────────────────────────────────────────────────────────────────────
detect-changes:
name: Detect Changes
runs-on: ubuntu-latest
outputs:
training-changed: ${{ steps.filter.outputs.training }}
data-changed: ${{ steps.filter.outputs.data }}
should-train: ${{ steps.decide.outputs.should-train }}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history needed for accurate diff
- name: Path filter
id: filter
uses: dorny/paths-filter@v3
with:
filters: |
training:
- 'src/training/**'
- 'src/features/**'
- 'src/models/**'
- 'config/training*.yaml'
- 'requirements.txt'
data:
- 'data/training/**'
- 'scripts/prepare_data.py'
- name: Decide whether to train
id: decide
run: |
if [[ "${{ steps.filter.outputs.training }}" == "true" || \
"${{ steps.filter.outputs.data }}" == "true" ]]; then
echo "should-train=true" >> $GITHUB_OUTPUT
echo "Training will run: training=${{ steps.filter.outputs.training }} data=${{ steps.filter.outputs.data }}"
else
echo "should-train=false" >> $GITHUB_OUTPUT
echo "Skipping training: no training code or data changes detected"
fi
# ─────────────────────────────────────────────────────────────────────────
# JOB 3: Training - conditional on detect-changes, runs on GPU runner
# ─────────────────────────────────────────────────────────────────────────
training:
name: Train and Evaluate
needs: [code-ci, detect-changes]
if: needs.detect-changes.outputs.should-train == 'true'
# Self-hosted GPU runner - see GPU runner configuration section below
runs-on: [self-hosted, gpu, linux]
timeout-minutes: 180 # 3 hour hard limit
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
# GPU runner has CUDA pre-installed, but pip cache still speeds things up
- name: Cache pip (GPU runner)
uses: actions/cache@v4
with:
path: ~/.cache/pip
key: gpu-pip-${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements*.txt') }}
restore-keys: |
gpu-pip-${{ env.PYTHON_VERSION }}-
- name: Install dependencies
run: pip install -r requirements.txt
- name: Configure AWS credentials (for S3 data access)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_TRAINING_ROLE_ARN }}
aws-region: us-east-1
- name: Download training data from S3
run: |
aws s3 sync \
s3://${{ secrets.TRAINING_DATA_BUCKET }}/fraud/latest/ \
data/training/ \
--exclude "*.tmp"
- name: Validate training data
run: python scripts/validate_data.py data/training/train.parquet
- name: Train model
id: train
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
MLFLOW_EXPERIMENT_NAME: fraud-detection-ci
TRAINING_RUN_NAME: "ci-${{ github.sha }}-${{ github.run_number }}"
run: |
python src/training/train.py \
--data-path data/training/train.parquet \
--output-dir artifacts/model \
--config config/training.yaml \
--run-name "$TRAINING_RUN_NAME"
# Extract run ID from training output for downstream steps
echo "run-id=$(cat artifacts/model/mlflow_run_id.txt)" >> $GITHUB_OUTPUT
- name: Evaluate model
run: |
python src/evaluation/evaluate.py \
--model-path artifacts/model/model.joblib \
--eval-data-path data/eval/eval_set_v3.parquet \
--output-path artifacts/evaluation_results.json
- name: Download baseline metrics
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
run: |
python scripts/fetch_baseline_metrics.py \
--model-name fraud-detector \
--stage Production \
--output-path artifacts/baseline_metrics.json
- name: Check performance gate
run: |
python scripts/check_gate.py \
--new-metrics artifacts/evaluation_results.json \
--baseline-metrics artifacts/baseline_metrics.json \
--min-auc 0.90 \
--max-regression 0.01
# Upload model and metrics as GitHub Actions artifacts
- name: Upload model artifact
if: success()
uses: actions/upload-artifact@v4
with:
name: trained-model-${{ github.sha }}
path: |
artifacts/model/
artifacts/evaluation_results.json
retention-days: 30
- name: Register model in MLflow (on main branch only)
if: github.ref == 'refs/heads/main' && success()
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
run: |
python scripts/register_model.py \
--run-id ${{ steps.train.outputs.run-id }} \
--model-name fraud-detector \
--stage Staging
- name: Post evaluation summary to PR
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const results = JSON.parse(fs.readFileSync('artifacts/evaluation_results.json'));
const body = [
'## Model Evaluation Results',
'',
`| Metric | Value |`,
`|--------|-------|`,
`| ROC-AUC | ${results.roc_auc.toFixed(4)} |`,
`| Average Precision | ${results.average_precision.toFixed(4)} |`,
`| F1 @ 0.5 | ${results.f1_at_0_5.toFixed(4)} |`,
`| Eval Samples | ${results.n_eval_samples.toLocaleString()} |`,
].join('\n');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: body
});
# ─────────────────────────────────────────────────────────────────────────
# JOB 4: Integration tests - run after training if model is available
# ─────────────────────────────────────────────────────────────────────────
integration-tests:
name: Integration Tests
needs: [code-ci]
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Cache pip
uses: actions/cache@v4
with:
path: ~/.cache/pip
key: pip-${{ env.PYTHON_VERSION }}-${{ hashFiles('requirements*.txt') }}
- name: Install dependencies
run: pip install -r requirements-dev.txt
- name: Run integration tests
run: |
pytest tests/integration/ \
-v \
--tb=short \
--timeout=300 \
-x # Stop on first failure
Matrix Builds for Multiple Python / Framework Versions
When supporting multiple environments, use matrix builds:
# .github/workflows/compatibility.yml
jobs:
test-matrix:
name: Test Python ${{ matrix.python-version }} / sklearn ${{ matrix.sklearn-version }}
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
sklearn-version: ["1.3", "1.4", "1.5"]
exclude:
# sklearn 1.5 dropped Python 3.10 support
- python-version: "3.10"
sklearn-version: "1.5"
fail-fast: false # Complete all matrix jobs even if one fails
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install specific sklearn version
run: |
pip install scikit-learn==${{ matrix.sklearn-version }}.*
pip install -r requirements-dev.txt --no-deps scikit-learn
- name: Run unit tests
run: pytest tests/unit/ -v --tb=short
Self-Hosted GPU Runner Setup
GitHub-hosted runners do not have GPUs. For GPU training, use self-hosted runners:
# Runner runs on a machine with NVIDIA GPU
# Register it with: ./config.sh --url https://github.com/ORG/REPO --token TOKEN --labels gpu,linux
runs-on: [self-hosted, gpu, linux]
The runner machine needs:
- NVIDIA drivers installed
- CUDA toolkit matching your training framework version
- The GitHub Actions runner agent:
actions/runner - Docker with NVIDIA Container Runtime (for isolated training)
# Setup script for self-hosted GPU runner (Ubuntu 22.04)
# Run once on the GPU machine
# Install NVIDIA drivers + CUDA
sudo apt-get install -y nvidia-driver-525 cuda-toolkit-12-1
# Install Docker
curl -fsSL https://get.docker.com | sh
# Install NVIDIA Container Runtime
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
# Verify GPU access
nvidia-smi
# Register the runner (get token from GitHub repo Settings > Actions > Runners)
mkdir ~/actions-runner && cd ~/actions-runner
curl -o actions-runner-linux-x64-2.317.0.tar.gz -L \
https://github.com/actions/runner/releases/download/v2.317.0/actions-runner-linux-x64-2.317.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.317.0.tar.gz
./config.sh --url https://github.com/YOUR_ORG/YOUR_REPO \
--token YOUR_REGISTRATION_TOKEN \
--labels gpu,linux,cuda12
sudo ./svc.sh install && sudo ./svc.sh start
Secrets Management
Never put credentials in the workflow YAML. Use GitHub Secrets:
# In workflow YAML: reference secrets
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}
# AWS: prefer OIDC over access keys (no static credentials)
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/github-actions-ml-training
aws-region: us-east-1
# OIDC: no AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY needed
# scripts/setup_oidc.py - one-time setup: create AWS role for GitHub OIDC
import boto3
import json
# Trust policy: allows GitHub Actions to assume this role
trust_policy = {
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com"},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
},
"StringLike": {
# Limit to your specific repo
"token.actions.githubusercontent.com:sub": "repo:YOUR_ORG/YOUR_REPO:*"
}
}
}]
}
Reusable Workflows
Extract common patterns into reusable workflows:
# .github/workflows/reusable-training.yml
name: Reusable Training Workflow
on:
workflow_call:
inputs:
model-name:
required: true
type: string
data-path:
required: true
type: string
config-file:
required: false
type: string
default: "config/training.yaml"
secrets:
MLFLOW_TRACKING_URI:
required: true
AWS_TRAINING_ROLE_ARN:
required: true
jobs:
train:
runs-on: [self-hosted, gpu, linux]
steps:
- uses: actions/checkout@v4
- name: Train ${{ inputs.model-name }}
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
run: |
python src/training/train.py \
--model-name ${{ inputs.model-name }} \
--data-path ${{ inputs.data-path }} \
--config ${{ inputs.config-file }}
# .github/workflows/fraud-model.yml - calls the reusable workflow
name: Fraud Model CI
on:
push:
paths: ['src/training/**', 'data/fraud/**']
jobs:
train:
uses: ./.github/workflows/reusable-training.yml
with:
model-name: fraud-detector
data-path: data/fraud/latest/train.parquet
secrets:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
AWS_TRAINING_ROLE_ARN: ${{ secrets.AWS_TRAINING_ROLE_ARN }}
Production Notes
Artifact retention: GitHub Actions artifacts are deleted after the retention period (default 90 days, configurable down to 1 day). For model artifacts, use GitHub Actions to upload to S3 or your model registry - do not rely on GitHub artifacts as your model store.
Concurrency groups: Set concurrency at the workflow level with cancel-in-progress: true.
On a busy PR branch, without this, multiple training runs can queue up, waste GPU time, and
create race conditions in model registration.
Timeout values: Always set timeout-minutes on training jobs. A hung training job (e.g.,
waiting for GPU memory that never freed) will otherwise run until GitHub's 6-hour job limit,
burning compute and blocking the runner.
Caching strategy: The actions/cache key should include the hash of your dependency file
(hashFiles('requirements.txt')). When dependencies change, the cache is invalidated and
reinstalled. Use restore-keys for partial cache hits - useful when only some packages changed.
:::tip Use workflow_dispatch for Manual Training Triggers
Add workflow_dispatch to your trigger list to allow manual workflow runs from the GitHub UI.
This is invaluable for running a full retraining manually when you know data has changed outside
the normal PR flow.
on:
push:
branches: [main]
pull_request:
branches: [main]
workflow_dispatch: # Enables manual trigger from GitHub UI
inputs:
force-retrain:
description: 'Force retraining even if no training changes detected'
type: boolean
default: false
:::
:::warning Path Filter Gotcha
The dorny/paths-filter action uses the diff between the PR branch and the base branch -
which is correct. But the on.push.paths filter uses the diff between the commit and its
parent - which means a force-push or rebase can change what files appear changed. For
production pipelines, use the paths-filter action in a separate job rather than on.push.paths
for reliable change detection.
:::
:::danger Storing Credentials as Repository Secrets vs Environment Secrets Repository secrets are accessible to all workflows in the repo. If a contributor creates a malicious PR that modifies a workflow file, they can exfiltrate secrets. For production credentials (MLFLOW_TRACKING_URI with write access, AWS roles), use GitHub Environments with required reviewers, or scope them to protected branches only. :::
Interview Q&A
Q: How do you prevent training from running on every PR regardless of what changed?
Use path-based triggers with the dorny/paths-filter action in a detect-changes job. The job
outputs a boolean (should-train) based on whether training code (src/training/**) or training
data paths changed. Downstream training jobs check if: needs.detect-changes.outputs.should-train == 'true'
and skip entirely if the flag is false. This can reduce GPU runner usage by 60-80% on active repos.
Q: How do you set up GPU runners in GitHub Actions?
GitHub-hosted runners do not provide GPUs. You register a self-hosted runner on a GPU machine
(cloud VM or physical) using the GitHub Actions runner agent, labeling it gpu. Workflows
target it with runs-on: [self-hosted, gpu, linux]. The machine needs NVIDIA drivers, CUDA
toolkit, and optionally NVIDIA Container Runtime for isolated training. Auto-scaling can be
achieved with tools like actions-runner-controller on Kubernetes with GPU node pools.
Q: How do you manage secrets securely in GitHub Actions ML pipelines?
Prefer OIDC (OpenID Connect) over static credentials wherever possible. For AWS, create an IAM
role with a trust policy that allows GitHub Actions to assume it, then use
aws-actions/configure-aws-credentials with role-to-assume - no access keys required. For
other secrets (MLflow tracking URI, Slack webhooks), use GitHub Encrypted Secrets and reference
them as ${{ secrets.SECRET_NAME }}. Never echo secrets or put them in artifacts.
Q: How do you make evaluation results visible in pull requests?
Use the actions/github-script action to post a comment to the PR via the GitHub API. Read the
evaluation JSON from the artifacts, format it as a markdown table, and call
github.rest.issues.createComment. Add this step with if: github.event_name == 'pull_request'
so it only runs on PRs, not on direct pushes to main.
Q: What is a reusable workflow and when should you use one in ML CI?
A reusable workflow (workflow_call trigger) is a GitHub Actions workflow that can be called
from other workflows like a function. In ML, they are useful when you have multiple models
(fraud, churn, ranking) that all go through the same training + evaluation + registration
sequence. Instead of duplicating 200 lines of YAML per model, you write the logic once in a
reusable workflow and each model's workflow calls it with different parameters. This keeps
changes (security patches, new evaluation steps) centralized.
