Skip to main content

Helm for ML Deployments

The Inconsistency Tax

Your ML serving stack has six components: the model server, a feature store proxy, a Redis cache, a preprocessing service, an async result queue, and a Prometheus metrics exporter. Six separate YAML files per environment, three environments (dev, staging, prod). That's 18 YAML files, each diverging slightly as engineers make environment-specific fixes without updating the others.

Last week, a bug fix was applied to the staging model server Deployment but not to the production one. Prod has been running with the broken configuration for 11 days. Nobody noticed because nobody checks all 18 files. The dev environment is missing the Prometheus exporter entirely - it was never added after the monitoring team requested it four months ago.

This is the configuration drift problem. It happens when configuration is spread across many files with no single source of truth, no templating, and no mechanism to ensure consistency across environments. Helm solves this by making your application a parameterized package where environment-specific differences are expressed in values files, and all components are versioned, released, and upgraded together.

:::tip 🎮 Interactive Playground Visualize this concept: Try the Kubernetes for ML demo on the EngineersOfAI Playground - no code required. :::

What Helm Is

Helm is the package manager for Kubernetes. It defines the concept of a chart (a package of YAML templates), a release (an instance of a chart deployed to a cluster), and values (the parameters that customize a release). You run helm install, helm upgrade, and helm rollback instead of kubectl apply for individual files.

Helm 3 (the current version, released 2019) removed the server-side component (Tiller) that Helm 2 required. In Helm 3, the client directly communicates with the Kubernetes API server, and release metadata is stored as Kubernetes Secrets in the release's namespace.

Chart Anatomy

A Helm chart is a directory with a specific structure:

fraud-model/ # chart root - same name as the chart
Chart.yaml # chart metadata
values.yaml # default values
values-staging.yaml # staging overrides (outside chart, in your repo)
values-prod.yaml # prod overrides (outside chart, in your repo)
templates/ # Kubernetes manifest templates
_helpers.tpl # shared template helpers (naming, labels)
deployment.yaml
service.yaml
configmap.yaml
hpa.yaml
ingress.yaml
serviceaccount.yaml
NOTES.txt # printed after install/upgrade
charts/ # dependency charts (subchart packaging)

Chart.yaml

apiVersion: v2
name: fraud-model
description: Fraud detection model serving stack
type: application
version: 1.4.2 # chart version (increment on chart changes)
appVersion: "v2.1.0" # application version (the model version)
maintainers:
- name: ML Platform Team
dependencies:
- name: redis
version: "18.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled # only install if values.redis.enabled=true

values.yaml - The Parameter Contract

# Default values - safe to use in dev, override in staging/prod

replicaCount: 1 # dev: 1 replica, prod: 3

image:
repository: registry.company.com/fraud-model
tag: "v2.1.0"
pullPolicy: IfNotPresent

resources:
requests:
cpu: "1"
memory: "4Gi"
limits:
cpu: "2"
memory: "8Gi"

service:
type: ClusterIP
port: 80

model:
path: "/models/fraud/v2.1.0/model.pt"
decisionThreshold: "0.75"
batchSize: "16"

featureServer:
url: "http://feature-server-svc:8081"

redis:
enabled: false # disabled in dev, enabled in prod
host: "redis-svc"
port: "6379"

autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 70

ingress:
enabled: false
host: ""
tlsSecretName: ""

podAnnotations: {}
nodeSelector: {}
tolerations: []
affinity: {}

Templates - Writing Parameterized Manifests

Templates use Go's text/template syntax with Helm-specific functions:

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "fraud-model.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "fraud-model.labels" . | nindent 4 }}
annotations:
helm.sh/chart: {{ include "fraud-model.chart" . }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "fraud-model.selectorLabels" . | nindent 6 }}
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
{{- include "fraud-model.selectorLabels" . | nindent 8 }}
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 8080
env:
- name: MODEL_PATH
value: {{ .Values.model.path | quote }}
- name: DECISION_THRESHOLD
value: {{ .Values.model.decisionThreshold | quote }}
- name: BATCH_SIZE
value: {{ .Values.model.batchSize | quote }}
- name: FEATURE_SERVER_URL
value: {{ .Values.featureServer.url | quote }}
{{- if .Values.redis.enabled }}
- name: REDIS_HOST
value: {{ .Values.redis.host | quote }}
- name: REDIS_PORT
value: {{ .Values.redis.port | quote }}
{{- end }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
failureThreshold: 3
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 120
periodSeconds: 30
failureThreshold: 3
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}

The _helpers.tpl File

Shared naming functions prevent inconsistency across templates:

# templates/_helpers.tpl

{{/*
Expand the name of the chart.
*/}}
{{- define "fraud-model.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Full name: release-name + chart-name, truncated to 63 chars (K8s DNS limit)
*/}}
{{- define "fraud-model.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}

{{/*
Standard labels applied to all resources
*/}}
{{- define "fraud-model.labels" -}}
helm.sh/chart: {{ include "fraud-model.chart" . }}
{{ include "fraud-model.selectorLabels" . }}
app.kubernetes.io/version: {{ .Values.image.tag | quote }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{/*
Selector labels - used in Deployment.spec.selector and Service.spec.selector
*/}}
{{- define "fraud-model.selectorLabels" -}}
app.kubernetes.io/name: {{ include "fraud-model.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

Values Files Per Environment

The values file pattern allows environment-specific overrides without duplicating templates:

# values-staging.yaml
replicaCount: 2

image:
tag: "v2.2.0-rc1" # test release candidate in staging

resources:
requests:
cpu: "2"
memory: "8Gi"
limits:
cpu: "4"
memory: "16Gi"

model:
decisionThreshold: "0.70" # lower threshold for staging to catch more edge cases

redis:
enabled: true
host: "redis-staging-svc"

ingress:
enabled: true
host: "fraud-model-staging.internal.company.com"
tlsSecretName: "staging-tls"
# values-prod.yaml
replicaCount: 3

image:
tag: "v2.1.0" # pinned to stable version

resources:
requests:
cpu: "4"
memory: "16Gi"
limits:
cpu: "8"
memory: "24Gi"

autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 15
targetCPUUtilizationPercentage: 60

redis:
enabled: true
host: "redis-prod-svc"

podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"

affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values: ["fraud-model"]
topologyKey: kubernetes.io/hostname

ingress:
enabled: true
host: "fraud-model.api.company.com"
tlsSecretName: "prod-tls"

Helm Commands with Values Files

# Install in dev (uses defaults from values.yaml)
helm install fraud-model ./fraud-model \
--namespace team-fraud-dev \
--create-namespace

# Install in staging (override with staging values)
helm install fraud-model ./fraud-model \
--namespace team-fraud-staging \
--values ./fraud-model/values-staging.yaml

# Install in prod
helm install fraud-model ./fraud-model \
--namespace team-fraud-prod \
--values ./fraud-model/values-prod.yaml

# Upgrade staging with new image tag
helm upgrade fraud-model ./fraud-model \
--namespace team-fraud-staging \
--values ./fraud-model/values-staging.yaml \
--set image.tag=v2.2.0-rc2 \
--atomic \ # rollback automatically on failure
--timeout 5m

# Diff before upgrading (requires helm-diff plugin)
helm diff upgrade fraud-model ./fraud-model \
--namespace team-fraud-prod \
--values ./fraud-model/values-prod.yaml

# Rollback to previous release revision
helm rollback fraud-model 3 --namespace team-fraud-prod

# List all releases
helm list -A

# Release history
helm history fraud-model -n team-fraud-prod
# REVISION STATUS CHART DESCRIPTION
# 1 superseded fraud-model-1.3.0 Install complete
# 2 superseded fraud-model-1.3.1 Upgrade complete
# 3 deployed fraud-model-1.4.2 Upgrade complete

Helm Hooks - Pre/Post Model Validation

Helm hooks let you run Jobs at specific points in the release lifecycle. For ML deployments, this is powerful: run a model validation job before traffic is switched to the new model version.

# templates/model-validation-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "fraud-model.fullname" . }}-pre-upgrade-validate
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": pre-upgrade # runs before upgrade begins
"helm.sh/hook-weight": "-5" # order among multiple pre-upgrade hooks
"helm.sh/hook-delete-policy": hook-succeeded # clean up after success
spec:
backoffLimit: 0 # fail fast - do not retry
template:
spec:
restartPolicy: Never
containers:
- name: model-validator
image: registry.company.com/ml-validator:v1.2
command:
- python
- validate_model.py
env:
- name: MODEL_PATH
value: {{ .Values.model.path | quote }}
- name: VALIDATION_DATASET
value: "s3://ml-data/validation/fraud/2026-01/"
- name: MINIMUM_AUC
value: "0.92" # fail if AUC drops below this
- name: MAXIMUM_P99_LATENCY_MS
value: "150"

The validation script:

# validate_model.py
import sys
import os
import torch
import numpy as np
from sklearn.metrics import roc_auc_score

def validate_model():
model_path = os.environ["MODEL_PATH"]
min_auc = float(os.environ.get("MINIMUM_AUC", "0.90"))
max_latency_ms = float(os.environ.get("MAXIMUM_P99_LATENCY_MS", "200"))

print(f"Loading model from {model_path}...")
model = torch.jit.load(model_path)
model.eval()

# Load validation dataset
X_val, y_val = load_validation_data(os.environ["VALIDATION_DATASET"])

# Evaluate AUC
with torch.no_grad():
preds = model(torch.tensor(X_val, dtype=torch.float32)).sigmoid().numpy()

auc = roc_auc_score(y_val, preds)
print(f"Validation AUC: {auc:.4f} (minimum: {min_auc})")

if auc < min_auc:
print(f"FAIL: AUC {auc:.4f} is below minimum {min_auc}")
sys.exit(1)

# Check latency
latencies = []
for _ in range(100):
start = time.perf_counter()
with torch.no_grad():
_ = model(torch.zeros(1, 512, dtype=torch.float32))
latencies.append((time.perf_counter() - start) * 1000)

p99 = np.percentile(latencies, 99)
print(f"P99 latency: {p99:.1f}ms (maximum: {max_latency_ms}ms)")

if p99 > max_latency_ms:
print(f"FAIL: P99 latency {p99:.1f}ms exceeds maximum {max_latency_ms}ms")
sys.exit(1)

print("All validation checks passed.")
sys.exit(0)

if __name__ == "__main__":
validate_model()

With --atomic in helm upgrade, if the pre-upgrade hook job fails, Helm automatically rolls back to the previous release. No manual intervention needed - the bad model version never reaches production.

Umbrella Charts - Multi-Component ML Stacks

An umbrella chart is a parent chart that declares multiple subcharts as dependencies. It lets you deploy your entire ML stack (model server + feature store + Redis + monitoring) as a single unit with a single helm upgrade.

# ml-serving-stack/Chart.yaml
apiVersion: v2
name: ml-serving-stack
description: Complete ML serving infrastructure for team-fraud
version: 2.3.0
dependencies:
- name: fraud-model
version: "1.4.x"
repository: "https://charts.company.com/ml-charts"
- name: feature-store-proxy
version: "0.9.x"
repository: "https://charts.company.com/ml-charts"
- name: redis
version: "18.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: global.redis.enabled
- name: prometheus-adapter
version: "4.x.x"
repository: "https://prometheus-community.github.io/helm-charts"
condition: global.monitoring.enabled
# ml-serving-stack/values.yaml
global:
environment: dev
imageRegistry: registry.company.com
redis:
enabled: true
monitoring:
enabled: true

# Values passed to the fraud-model subchart
fraud-model:
replicaCount: 1
image:
tag: "v2.1.0"

# Values passed to the feature-store-proxy subchart
feature-store-proxy:
replicaCount: 1
upstream: "http://feature-store-svc:8082"

# Values passed to the redis subchart (Bitnami)
redis:
architecture: standalone
auth:
enabled: false # dev only - enable auth in prod
# Update subchart dependencies
helm dependency update ./ml-serving-stack

# Install the entire ML stack in one command
helm install team-fraud-stack ./ml-serving-stack \
--namespace team-fraud \
--values ./ml-serving-stack/values-prod.yaml

# Upgrade only the model version across the stack
helm upgrade team-fraud-stack ./ml-serving-stack \
--namespace team-fraud \
--values ./ml-serving-stack/values-prod.yaml \
--set fraud-model.image.tag=v2.2.0 \
--atomic

Production Notes

Use chart repositories, not local directories. In production CI/CD, package charts and push them to a Helm repository (OCI registry, ChartMuseum, or GitHub Pages). Reference charts by repository URL and version rather than local path. This ensures every release uses the exact same chart version and prevents environment drift.

# Package and push chart to OCI registry
helm package ./fraud-model
helm push fraud-model-1.4.2.tgz oci://registry.company.com/helm-charts

Store values files in version control alongside manifests. The values-prod.yaml file is as important as the chart templates - it defines production configuration. It should go through the same code review process as application code changes.

Use --dry-run before every upgrade. helm upgrade --dry-run renders all templates with the given values and sends them to the API server for validation, without actually applying them. Catch syntax errors and invalid configurations before they reach the cluster.

helm upgrade fraud-model ./fraud-model \
--namespace team-fraud-prod \
--values values-prod.yaml \
--dry-run

Common Mistakes

:::danger helm upgrade Without --atomic in Production Running helm upgrade without --atomic means Helm will not automatically roll back if the upgrade fails (pods crash-loop, readiness probe never passes). The release will be left in a failed state, and you have to manually run helm rollback. Always use --atomic --timeout 10m for production upgrades. The atomic flag rolls back automatically if any pod fails to become ready within the timeout. :::

:::warning Using --set for Sensitive Values helm install --set redis.password=mysecret123 stores the password in the Helm release history (as a Kubernetes Secret in base64). Anyone with kubectl get secret access to the namespace can read it. For sensitive values, use external secret managers and reference them via ExternalSecrets, or use --set-string with a value fetched from a secrets manager at deploy time in CI. :::

:::warning Forgetting helm dependency update After Adding Subchart After adding a dependency to Chart.yaml, you must run helm dependency update to download the subchart into charts/. Without this step, the umbrella chart templates will fail to render. Commit the updated charts/ directory (or .tgz files) to version control so CI doesn't need to download dependencies at deploy time. :::

Interview Q&A

Q1: What problem does Helm solve for ML deployments that kubectl apply alone cannot?

kubectl apply applies individual YAML files but has no concept of a versioned, atomic application release. With kubectl alone: you apply 6 files for your ML stack, and there's no mechanism to ensure all 6 are consistent with each other. Rolling back means manually re-applying previous versions of each file. Configuration differences across environments (dev/staging/prod) are managed by maintaining separate copies of every file, which inevitably drift. Helm treats the entire stack as a single versioned release. Helm templates parameterize all environment differences in values files. helm rollback returns the entire stack to a previous coherent state atomically. helm diff shows exactly what will change before you apply it.

Q2: How do Helm hooks enable safer ML model deployments?

Helm hooks run Kubernetes Jobs at specific points in the release lifecycle: pre-install, pre-upgrade, post-upgrade, etc. For ML, the pre-upgrade hook runs a model validation Job that loads the new model version, runs it against a held-out validation dataset, checks AUC/F1/latency against minimum thresholds, and exits with code 0 (pass) or 1 (fail). If the hook Job fails, Helm with --atomic automatically rolls back to the previous release. The new model version never reaches production serving pods. This turns model quality validation from a manual step (easily skipped under deadline pressure) into a mandatory gate enforced by the deployment infrastructure.

Q3: Explain the Helm values hierarchy and how it enables multi-environment ML deployments.

Helm values have multiple layers merged in priority order (highest wins): built-in values (.Release.Name, .Chart.Name), the chart's values.yaml (defaults), values files passed with --values (in order, later files override earlier), and --set flags (highest priority, override everything). For ML multi-environment deployments: values.yaml contains safe development defaults (1 replica, small resources, no autoscaling, Redis disabled). values-staging.yaml overrides to 2 replicas, staging resource allocations, and staging URLs. values-prod.yaml overrides to 3 replicas, production resources, autoscaling enabled, pod anti-affinity, and production endpoints. The templates stay identical across all environments - only values differ. This eliminates configuration drift because there is one template for all environments.

Q4: What is an umbrella chart and when would an ML team use one?

An umbrella chart is a parent Helm chart that declares other charts as dependencies and optionally provides additional templates that tie them together. An ML team would use one when they have a multi-component serving stack that always needs to be deployed and upgraded together: for example, a model server, a feature store proxy, Redis for caching, and a Prometheus adapter for custom metrics. Without an umbrella chart, each component is a separate Helm release - you can have version mismatches where the model server is upgraded but the feature store proxy is not, causing compatibility issues. With an umbrella chart, helm upgrade updates all components atomically, and you can define global values (shared image registry URL, environment name) that cascade to all subcharts.

Q5: What does helm upgrade --atomic do, and why is it important for ML serving deployments?

--atomic makes the upgrade transactional: if the upgrade fails (any pod fails to become ready within the timeout, a hook job fails, or any resource fails to apply), Helm automatically triggers a rollback to the last successful release. For ML serving, this is critical because a failed upgrade that does not roll back leaves the cluster in a degraded state: some pods may be running the new version, others the old, readiness probes may be failing, and the Service may be routing to a mix of versions. Manual recovery under pressure at 2am is error-prone. With --atomic, failure is always clean - the cluster returns to the last known good state automatically, and the on-call engineer can investigate the upgrade failure from a stable baseline.

© 2026 EngineersOfAI. All rights reserved.