Skip to main content

MLflow Model Registry in Production

The Approval Gate Problem

Your ML team has grown to 15 engineers. Everyone is enthusiastic, everyone is shipping experiments, and now everyone has the ability to push a model to production. Last week, a junior data scientist accidentally promoted an under-trained model to Production on a Friday afternoon before leaving for the weekend. Your users got degraded recommendations for 36 hours before someone noticed.

The solution is not to distrust your engineers. The solution is to build an approval workflow that makes accidental production promotion impossible. Every model that wants to reach production must pass automated evaluation gates, get reviewed by at least one senior engineer, and be explicitly promoted through a controlled process - not by whoever happens to have the credentials.

MLflow Model Registry is the tool most teams reach for to implement this. It is not just a file store. Used properly, it is the backbone of your model governance workflow.


:::tip 🎮 Interactive Playground Visualize this concept: Try the Model Registry & Versioning demo on the EngineersOfAI Playground - no code required. :::

Why MLflow Registry?

MLflow started as an experiment tracking tool but grew into a full ML lifecycle platform. The Model Registry, added in MLflow 1.0 (2019), gave teams:

  • Named model versions with structured lifecycle stages
  • Programmatic and UI-based stage transitions
  • Webhook notifications on stage changes
  • Annotations, tags, and descriptions attached to models
  • Integration with the rest of the MLflow ecosystem (tracking, artifacts)

The alternatives (W&B Registry, SageMaker Registry, Vertex AI Registry) offer similar concepts but MLflow is open-source, self-hostable, and framework-agnostic - which is why it dominates in teams that need flexibility.


Architecture Overview


Setting Up the Registry

Backend Configuration

MLflow supports multiple backends for storing registry metadata. In production, always use a database backend - not the default local SQLite.

# docker-compose.yml snippet
services:
mlflow:
image: ghcr.io/mlflow/mlflow:v2.10.0
command: >
mlflow server
--backend-store-uri postgresql://mlflow:${MLFLOW_DB_PASSWORD}@postgres/mlflow
--default-artifact-root s3://${MLFLOW_ARTIFACT_BUCKET}/artifacts
--host 0.0.0.0
--port 5000
environment:
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
ports:
- "5000:5000"
depends_on:
- postgres

postgres:
image: postgres:16
environment:
POSTGRES_DB: mlflow
POSTGRES_USER: mlflow
POSTGRES_PASSWORD: ${MLFLOW_DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data

Set the tracking URI in your training code:

import mlflow
import os

mlflow.set_tracking_uri(os.environ.get("MLFLOW_TRACKING_URI", "http://localhost:5000"))

Registering Models from Runs

The cleanest pattern is to register directly when logging:

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score, f1_score, average_precision_score
import os

def train_and_register(X_train, y_train, X_val, y_val, config: dict):
"""Train a model and register it in MLflow."""

with mlflow.start_run(run_name=f"fraud-detector-{config['n_estimators']}trees") as run:
# Log parameters
mlflow.log_params(config)

# Log environment info
mlflow.set_tags({
"team": "fraud-detection",
"use_case": "transaction-fraud",
"data_version": "v2024-01-15",
"feature_pipeline": "fp-v3.2.1",
"git_commit": os.environ.get("GIT_COMMIT_SHA", "unknown"),
})

# Train
model = RandomForestClassifier(**config, random_state=42)
model.fit(X_train, y_train)

# Evaluate
val_probs = model.predict_proba(X_val)[:, 1]
metrics = {
"val_auc": roc_auc_score(y_val, val_probs),
"val_f1": f1_score(y_val, (val_probs > 0.5).astype(int)),
"val_ap": average_precision_score(y_val, val_probs),
"val_size": len(y_val),
}
mlflow.log_metrics(metrics)

# Log and register in one call
mlflow.sklearn.log_model(
model,
artifact_path="model",
registered_model_name="fraud-detector", # This registers it
input_example=X_val[:5], # Documents input schema
)

print(f"Run ID: {run.info.run_id}")
print(f"Metrics: {metrics}")
return run.info.run_id

After a Run (Programmatic Registration)

Sometimes you want to evaluate first, then register only if it passes:

from mlflow.tracking import MlflowClient
from datetime import datetime

client = MlflowClient()

def register_if_good(run_id: str, model_name: str, min_auc: float = 0.80) -> str | None:
"""Register a model only if it meets performance requirements."""
run = client.get_run(run_id)
auc = run.data.metrics.get("val_auc", 0)

if auc < min_auc:
print(f"Model rejected: AUC {auc:.3f} < threshold {min_auc}")
return None

model_uri = f"runs:/{run_id}/model"
mv = mlflow.register_model(model_uri, model_name)

# Add description
client.update_model_version(
name=model_name,
version=mv.version,
description=f"Trained on data v2024-01-15. Val AUC: {auc:.4f}. "
f"Run: {run_id}. Approved for staging review."
)

print(f"Registered: {model_name} version {mv.version}")
return mv.version

Model Stages

Stage Transition API

Stages are the core governance primitive in MLflow Registry:

from mlflow.tracking import MlflowClient
from datetime import datetime

client = MlflowClient()

def promote_to_staging(model_name: str, version: str, reviewer: str) -> None:
"""Promote a model from None to Staging with reviewer annotation."""
client.transition_model_version_stage(
name=model_name,
version=version,
stage="Staging",
archive_existing_versions=False,
)

client.set_model_version_tag(
name=model_name,
version=version,
key="staging_reviewer",
value=reviewer,
)

client.set_model_version_tag(
name=model_name,
version=version,
key="staging_timestamp",
value=datetime.utcnow().isoformat(),
)

print(f"Promoted {model_name} v{version} to Staging")


def promote_to_production(model_name: str, version: str, approver: str) -> None:
"""Promote a model to Production, archiving the previous production version."""
# Promote new version - archive_existing_versions=True handles the archival
client.transition_model_version_stage(
name=model_name,
version=version,
stage="Production",
archive_existing_versions=True, # Auto-archive current production
)

client.set_model_version_tag(
name=model_name,
version=version,
key="production_approver",
value=approver,
)
client.set_model_version_tag(
name=model_name,
version=version,
key="production_timestamp",
value=datetime.utcnow().isoformat(),
)

print(f"Promoted {model_name} v{version} to Production")


def emergency_rollback(model_name: str, approver: str) -> None:
"""Roll back to the most recently archived version."""
archived = client.get_latest_versions(model_name, stages=["Archived"])
if not archived:
raise ValueError(f"No archived versions of {model_name} to roll back to")

# Sort by version number, get most recent
latest_archived = sorted(archived, key=lambda v: int(v.version))[-1]

print(f"Rolling back to {model_name} v{latest_archived.version}")
promote_to_production(model_name, latest_archived.version, f"ROLLBACK:{approver}")

Model Aliases (MLflow 2.x)

MLflow 2.x introduced aliases as a more flexible alternative to stages. An alias is a named pointer to a specific version:

# Set aliases
client.set_registered_model_alias(
name="fraud-detector",
alias="champion",
version="23",
)

client.set_registered_model_alias(
name="fraud-detector",
alias="challenger",
version="24",
)

client.set_registered_model_alias(
name="fraud-detector",
alias="shadow",
version="25",
)

# Load model by alias
model = mlflow.pyfunc.load_model("models:/fraud-detector@champion")

# Promote challenger to champion (canary succeeded)
client.set_registered_model_alias(
name="fraud-detector",
alias="champion",
version="24", # Former challenger now becomes champion
)

# Clean up old alias
client.delete_registered_model_alias(
name="fraud-detector",
alias="challenger",
)
tip

Aliases are more flexible than stages because you can have multiple named aliases simultaneously (champion, challenger, shadow, canary) versus stages which allow only one Production and one Staging version. For complex deployment patterns, prefer aliases over stages.


Searching and Querying the Registry

The registry is a database - use it like one:

def get_registry_report(model_name: str) -> dict:
"""Generate a full report on a model's registry history."""
client = MlflowClient()

# Get all versions
all_versions = client.search_model_versions(f"name='{model_name}'")

report = {
"model": model_name,
"total_versions": len(all_versions),
"by_stage": {},
"versions": [],
}

for v in sorted(all_versions, key=lambda x: int(x.version)):
run = client.get_run(v.run_id)
entry = {
"version": v.version,
"stage": v.current_stage,
"created": v.creation_timestamp,
"run_id": v.run_id,
"metrics": run.data.metrics,
"tags": dict(v.tags),
"description": v.description,
}
report["versions"].append(entry)

stage = v.current_stage
report["by_stage"][stage] = report["by_stage"].get(stage, 0) + 1

return report


def find_models_by_data_version(data_version: str) -> list:
"""Find all models trained on a specific data version - critical for data lineage."""
client = MlflowClient()
all_models = client.search_registered_models()
affected = []

for model in all_models:
versions = client.search_model_versions(f"name='{model.name}'")
for v in versions:
run = client.get_run(v.run_id)
if run.data.tags.get("data_version") == data_version:
affected.append({
"model": model.name,
"version": v.version,
"stage": v.current_stage,
})

return affected

Webhooks for Stage Transitions

Webhooks are how the registry integrates with the rest of your workflow. When a model transitions stages, MLflow POSTs to any HTTP endpoint.

from mlflow.tracking import MlflowClient
import os

client = MlflowClient()

# Webhook: trigger CI validation when model reaches Staging
staging_webhook = client.create_registry_webhook(
events=["MODEL_VERSION_TRANSITIONED_TO_STAGING"],
http_url_spec={
"url": "https://your-ci.example.com/api/webhooks/mlflow-staging",
"enable_ssl_verification": True,
"authorization": f"Bearer {os.environ['CI_WEBHOOK_TOKEN']}",
},
description="Trigger CI validation when model reaches Staging",
)

# Webhook: trigger deployment when model reaches Production
prod_webhook = client.create_registry_webhook(
events=["MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"],
http_url_spec={
"url": "https://your-ci.example.com/api/webhooks/mlflow-production",
"enable_ssl_verification": True,
"authorization": f"Bearer {os.environ['CI_WEBHOOK_TOKEN']}",
},
description="Trigger deployment when model reaches Production",
)

Webhook Handler (FastAPI)

from fastapi import FastAPI, Request, HTTPException, Header
import os

app = FastAPI()

@app.post("/api/webhooks/mlflow-staging")
async def handle_staging_transition(
request: Request,
authorization: str = Header(None),
):
"""Handle MLflow webhook when a model transitions to Staging."""
expected = f"Bearer {os.environ['CI_WEBHOOK_TOKEN']}"
if authorization != expected:
raise HTTPException(status_code=401, detail="Unauthorized")

payload = await request.json()
model_name = payload["model_name"]
version = payload["model_version"]["version"]

print(f"Staging transition: {model_name} v{version}")
trigger_staging_validation(model_name, version)

return {"status": "accepted"}


def trigger_staging_validation(model_name: str, version: str) -> None:
"""Trigger a GitLab CI pipeline to validate the staged model."""
import requests

requests.post(
f"{os.environ['GITLAB_URL']}/api/v4/projects/{os.environ['GITLAB_PROJECT_ID']}/trigger/pipeline",
data={
"token": os.environ["GITLAB_TOKEN"],
"ref": "main",
"variables[MODEL_NAME]": model_name,
"variables[MODEL_VERSION]": version,
"variables[PIPELINE_TYPE]": "staging-validation",
},
)

Access Control and Namespace Strategies

Access Control Patterns

MLflow Community Edition lacks built-in RBAC. Common approaches:

ApproachComplexityBest For
Proxy with JWT authMediumSelf-hosted small teams
mlflow-oidc-proxyMediumEnterprise on-prem
Databricks MLflowLow (managed)Teams needing native RBAC
AWS SageMaker RegistryLow (managed)AWS-native teams

Namespace Strategy for Large Teams

# Pattern: {domain}-{team}-{use-case}

client.create_registered_model(
name="payments-fraud-transaction-scorer",
tags={"domain": "payments", "team": "fraud", "use_case": "transaction_scoring"},
description="Scores transactions for fraud probability. "
)

client.create_registered_model(
name="rec-personalization-item-ranker",
tags={"domain": "recommendations", "team": "personalization"},
description="Ranks items for personalized recommendation feeds.",
)

# Query all models for a team
team_models = client.search_registered_models(
filter_string="tags.team = 'fraud'"
)
for m in team_models:
print(f"{m.name}")

Loading Models in Serving

import mlflow.pyfunc
import threading
from mlflow.tracking import MlflowClient

class RegistryAwareModelCache:
"""Thread-safe model cache that loads from the MLflow Registry."""

def __init__(self, model_name: str, stage: str = "Production"):
self.model_name = model_name
self.stage = stage
self._model = None
self._version = None
self._lock = threading.Lock()

def _load_current(self):
client = MlflowClient()
versions = client.get_latest_versions(self.model_name, stages=[self.stage])
if not versions:
raise RuntimeError(f"No {self.stage} model for {self.model_name}")
version = versions[0]
model_uri = f"models:/{self.model_name}/{self.stage}"
model = mlflow.pyfunc.load_model(model_uri)
return model, version.version

def get_model(self):
if self._model is None:
with self._lock:
if self._model is None:
self._model, self._version = self._load_current()
return self._model

def reload(self):
"""Force reload - call when registry signals a new production version."""
with self._lock:
self._model, self._version = self._load_current()
print(f"Reloaded {self.model_name} v{self._version}")


# Usage in serving
fraud_model = RegistryAwareModelCache("fraud-detector", stage="Production")

def score_transaction(features: dict) -> float:
import pandas as pd
model = fraud_model.get_model()
df = pd.DataFrame([features])
return float(model.predict(df)[0])

Common Mistakes

danger

Transitioning to Production without archiving the previous version. If you have two models in Production stage simultaneously, your deployment system does not know which one to serve. Always use archive_existing_versions=True when promoting to Production. With aliases, the "champion" alias should point to exactly one version at all times.

danger

Using the MLflow Tracking Server as the only state for production model information. If the MLflow server goes down, your deployment system should not fail. Cache the currently-deployed model version locally (e.g., in a Kubernetes ConfigMap) and only consult the registry on startup or during explicit reload operations.

warning

Not adding descriptions and tags at registration time. Tags added later are helpful, but critical metadata - data version, git SHA, evaluation metrics - must be set during the training run. You cannot add these retroactively from the run context.

warning

Allowing direct S3 access to bypass the registry. If your serving system can load a model from S3 directly, engineers will do it under pressure. The serving system should only load models via the registry URI format (models:/name/stage), never from raw object storage paths. Enforce this through IAM policies if needed.


Interview Q&A

Q: How do you implement an approval workflow with MLflow Registry?

A: The pattern is: (1) CI/CD promotes to Staging automatically when a model passes evaluation gates; (2) a webhook fires and creates a review task in Slack or Jira; (3) the reviewer approves, which calls the MLflow API to transition to Production and triggers the deployment pipeline; (4) reject sends the model back to None with a comment. Enforcement: only a CI/CD service account with restricted credentials can call the Production stage transition API. Individual engineers cannot self-promote their own models.


Q: What is the difference between MLflow model stages and model aliases?

A: Stages (None, Staging, Production, Archived) are a fixed four-state lifecycle - each version can be in exactly one stage, and there can be only one Production and one Staging version at a time. Aliases are flexible named pointers - a model can have multiple aliases simultaneously like "champion" pointing to v23, "challenger" pointing to v24, and "shadow" pointing to v25. Stages are good for simple linear workflows. Aliases are better for complex deployment patterns like canary or A/B testing where you need concurrent named references. MLflow 2.x recommends moving toward aliases.


Q: How would you handle MLflow Registry high availability?

A: Three layers: (1) PostgreSQL backend with standby replica and automatic failover - AWS RDS Multi-AZ or equivalent; (2) MLflow server behind a load balancer with multiple stateless instances - MLflow is stateless when using external backends; (3) deployment systems cache the current model version locally and do not depend on the registry for real-time predictions. The registry being down should mean "we cannot deploy new models," not "predictions are failing."


Q: How does the MLflow Registry integrate with a GitLab CI/CD pipeline?

A: The registry is the handoff point between training and deployment. Training job: trains, evaluates, registers, transitions to Staging if metrics pass. A webhook fires, triggering the staging validation pipeline. Staging pipeline: runs integration tests and shadow tests against the staged model. If all pass, transitions to Production. The production transition webhook fires the deployment pipeline, which reads the Production model URI from the registry and updates the serving fleet. Every stage is driven by registry state - the pipeline never directly references model file paths.


Q: A data scientist says "MLflow tracking is enough, I don't need the registry." How do you respond?

A: Tracking is excellent for experiments. But it lacks governance for production: no lifecycle stages, no formal declaration of what is in production, no stage-based access control, no webhooks for deployment triggers, and rollback means "find the right run ID in the UI" - which is error-prone at 2am under pressure. Tracking tells you what happened during training. The registry tells you what should be running in production. Both are needed for a mature ML workflow.


Summary

MLflow Model Registry provides the governance layer that turns a collection of trained models into a managed, auditable, deployable asset. The key primitives are: named models with versioned history, lifecycle stages with controlled transitions, rich metadata through tags and descriptions, and webhooks that integrate the registry into your broader CI/CD workflow. Used properly, the registry makes both good deployments faster and bad deployments impossible.

© 2026 EngineersOfAI. All rights reserved.