Skip to main content

Azure ML for MLOps

The Bank That Could Not Ship Models

The risk and compliance team at a mid-sized European bank had a problem that was immediately recognizable to anyone who has worked in enterprise ML: the models were good, but they could not get them to production. The data science team used Jupyter notebooks. The IT team used VMs and on-premises servers managed by a change management board. The risk team required sign-off on any model change before it touched a customer. Legal required documentation that proved the model decision process was auditable. The timeline from "model trained" to "model in production" was approximately six months.

There were five scikit-learn models in various stages of this process simultaneously. A credit scoring model that had been waiting for production approval for four months. A fraud detection model stuck in UAT because the UAT environment had a different version of scikit-learn than development. A customer segmentation model whose training notebook had been lost when a data scientist left the company - nobody knew how to retrain it. A loan default prediction model deployed to production that nobody wanted to touch because nobody was sure what would happen.

The bank's cloud team had been on Azure for two years. Their data was in Azure Data Lake Storage, their data pipelines ran in Azure Data Factory, their analysts used Azure Synapse. When the ML team finally got budget to fix the workflow, Azure ML was the natural choice - same IAM, same subscriptions, same compliance boundary, same AD integration.

This lesson documents what the migration looked like: from ad-hoc notebooks to a component-based Azure ML pipeline with approval workflows, MLflow tracking, managed endpoints, and a Responsible AI dashboard that satisfied the compliance team. Every pattern here is drawn from real enterprise Azure ML deployments.


:::tip 🎮 Interactive Playground Visualize this concept: Try the Cloud ML Platforms Compared demo on the EngineersOfAI Playground - no code required. :::

Why This Exists - Enterprise ML Needs Enterprise Infrastructure

Academic ML infrastructure - a laptop, a shared GPU server, and a Google Drive folder of notebooks - works until four things happen. First, you have multiple models in production simultaneously. Second, you have regulatory requirements around model documentation and auditability. Third, you have organizational separation between data scientists and IT operations. Fourth, you have a requirement to reproduce any model prediction made in the last three years.

Azure ML exists to solve these problems for organizations already invested in the Microsoft ecosystem. It integrates with Azure Active Directory for identity, Azure Key Vault for secrets, Azure Monitor for logging, Azure DevOps for CI/CD, and Microsoft Purview for data governance. For an enterprise that already manages its identity through AD and its infrastructure through Azure, Azure ML fits naturally into existing operational processes.

:::note Historical Context Azure ML launched in 2014 as a drag-and-drop Studio (classic). It was completely rebuilt from 2019-2020 as Azure Machine Learning v2 - a code-first platform. The Python SDK v2 and CLI v2 were released in 2022, replacing the v1 SDK with a significantly cleaner API. The old Studio experience became "Azure ML Designer" (still available but not the primary interface). MLflow integration was added in 2021 and has become the primary experiment tracking mechanism. :::


Azure ML Architecture

Azure ML is organized around a workspace - the top-level resource that groups all ML assets.

Key Concepts

Datastores are registered connections to Azure storage (Blob, ADLS Gen2, Azure SQL). They store credentials once so individual pipeline steps do not need to handle authentication.

Data Assets (formerly Datasets v2) are versioned references to data - a path in a datastore at a specific version. They enable data lineage without copying data.

Environments are Docker images with the runtime dependencies. Azure ML caches built environments in a Container Registry attached to the workspace.

Components are reusable pipeline steps defined by YAML + code. They have typed inputs and outputs, enabling pipeline composition like function composition.


SDK v2 vs CLI v2

Azure ML v2 has two primary interfaces - the Python SDK and the Azure ML CLI. They are equivalent in capability; the choice is a matter of workflow.

DimensionPython SDK v2CLI v2
Best forProgrammatic pipelines, notebooksCI/CD, YAML-first workflows
Primary file formatPython scriptsYAML job/component specs
IntegrationPython codeShell scripts, Azure DevOps
Learning curveFamiliar to ML engineersFamiliar to DevOps engineers

Most teams use both: CLI v2 for deployment YAML files and CI/CD jobs, SDK v2 for pipeline authoring and programmatic model management.

# Install the Azure ML SDK v2
# pip install azure-ai-ml azure-identity mlflow azureml-mlflow

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# Connect to workspace
ml_client = MLClient(
credential=DefaultAzureCredential(),
subscription_id="12345678-1234-1234-1234-123456789012",
resource_group_name="ml-platform-rg",
workspace_name="ml-platform-workspace",
)

print(f"Connected to workspace: {ml_client.workspace_name}")
print(f"Location: {ml_client.workspaces.get(ml_client.workspace_name).location}")

Compute Resources

Compute Clusters

Compute clusters are autoscaling pools of VMs for training. They scale to zero when idle and scale up (within minutes) when a job is submitted. This means you pay only for active training time.

from azure.ai.ml.entities import AmlCompute

def create_compute_cluster(ml_client: MLClient, cluster_name: str):
try:
cluster = ml_client.compute.get(cluster_name)
print(f"Cluster '{cluster_name}' already exists")
except Exception:
cluster = AmlCompute(
name=cluster_name,
type="amlcompute",
size="Standard_DS3_v2", # 4 vCPU, 14 GB RAM
min_instances=0, # Scale to zero when idle
max_instances=10,
idle_time_before_scale_down=120, # seconds
tier="Dedicated", # Use "LowPriority" for spot
)
ml_client.compute.begin_create_or_update(cluster).result()
print(f"Cluster '{cluster_name}' created")

return cluster

# GPU cluster for training
def create_gpu_cluster(ml_client: MLClient, cluster_name: str):
cluster = AmlCompute(
name=cluster_name,
type="amlcompute",
size="Standard_NC6s_v3", # 1x V100 GPU
min_instances=0,
max_instances=4,
idle_time_before_scale_down=300,
tier="LowPriority", # Spot - cheaper, can be preempted
)
ml_client.compute.begin_create_or_update(cluster).result()
return cluster

Compute Instances

Compute instances are dedicated VMs for development - essentially a managed notebook server. Unlike clusters, they do not scale to zero automatically (you must stop them manually or configure scheduled shutdown).

from azure.ai.ml.entities import ComputeInstance

instance = ComputeInstance(
name="dev-instance-ds",
type="computeinstance",
size="Standard_DS3_v2",
# Auto-shutdown at 7pm to avoid paying for overnight idle time
compute_instance_type="Standard_DS3_v2",
idle_time_before_shutdown_minutes=60,
)
ml_client.compute.begin_create_or_update(instance).result()

Data Assets and Datastores

from azure.ai.ml.entities import AzureBlobDatastore, Data
from azure.ai.ml.constants import AssetTypes

# Register a datastore (done once)
def register_blob_datastore(ml_client: MLClient):
datastore = AzureBlobDatastore(
name="training_data_store",
description="Azure Blob Storage for ML training data",
account_name="mlplatformstorageacct",
container_name="ml-data",
# Credentials stored in Key Vault (not in code)
account_key=None,
credentials={
"account_key": "{{secrets.STORAGE_ACCOUNT_KEY}}"
},
)
ml_client.datastores.create_or_update(datastore)
print("Datastore registered")


# Create versioned data assets
def create_data_asset(ml_client: MLClient, name: str, path: str, version: str):
data_asset = Data(
name=name,
version=version,
description=f"Training data for {name}",
path=path, # e.g., "azureml://datastores/training_data_store/paths/churn/v3/"
type=AssetTypes.URI_FOLDER,
)
ml_client.data.create_or_update(data_asset)
print(f"Data asset '{name}' version '{version}' registered")

# List data asset versions
for asset in ml_client.data.list("churn_training_data"):
print(f"Version: {asset.version}, Path: {asset.path}")

Environments

Environments pin the exact runtime dependencies for reproducibility.

from azure.ai.ml.entities import Environment

def create_training_environment(ml_client: MLClient):
env = Environment(
name="churn-training-env",
description="Environment for churn model training",
conda_file="environments/conda.yaml",
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
version="3",
)
ml_client.environments.create_or_update(env)
return env
# environments/conda.yaml
name: churn-training
channels:
- conda-forge
- defaults
dependencies:
- python=3.10
- pip:
- scikit-learn==1.3.0
- xgboost==1.7.0
- mlflow==2.8.0
- azureml-mlflow==1.53.0
- pandas==2.0.3
- pyarrow==13.0.0
- azure-ai-ml==1.11.0

Component-Based Pipelines

Components are the fundamental building block of Azure ML v2 pipelines. A component has a YAML spec and a Python script. The YAML defines inputs, outputs, and the execution environment. The script does the work.

Component YAML Specification

# components/preprocess/component.yaml
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: preprocess_churn_data
display_name: Preprocess Churn Training Data
description: Clean and prepare raw churn data for model training

inputs:
raw_data:
type: uri_folder
description: Raw input data folder
test_fraction:
type: number
default: 0.2
description: Fraction of data for test set

outputs:
processed_data:
type: uri_folder
description: Processed data ready for training

code: ./src

command: >-
python preprocess.py
--raw_data ${{inputs.raw_data}}
--processed_data ${{outputs.processed_data}}
--test_fraction ${{inputs.test_fraction}}

environment: azureml:churn-training-env:3
# components/train/component.yaml
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: train_churn_model
display_name: Train Churn XGBoost Model

inputs:
training_data:
type: uri_folder
max_depth:
type: integer
default: 6
n_estimators:
type: integer
default: 200
learning_rate:
type: number
default: 0.05
min_auc_threshold:
type: number
default: 0.75

outputs:
model_output:
type: uri_folder
mlflow_model:
type: mlflow_model

code: ./src

command: >-
python train.py
--training_data ${{inputs.training_data}}
--model_output ${{outputs.model_output}}
--mlflow_model ${{outputs.mlflow_model}}
--max_depth ${{inputs.max_depth}}
--n_estimators ${{inputs.n_estimators}}
--learning_rate ${{inputs.learning_rate}}
--min_auc_threshold ${{inputs.min_auc_threshold}}

environment: azureml:churn-training-env:3

Component Training Script

# components/train/src/train.py
import argparse
import os
import mlflow
import mlflow.sklearn
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, average_precision_score, f1_score
import joblib
import json

def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--training_data", type=str, required=True)
parser.add_argument("--model_output", type=str, required=True)
parser.add_argument("--mlflow_model", type=str, required=True)
parser.add_argument("--max_depth", type=int, default=6)
parser.add_argument("--n_estimators", type=int, default=200)
parser.add_argument("--learning_rate", type=float, default=0.05)
parser.add_argument("--min_auc_threshold", type=float, default=0.75)
return parser.parse_args()


def load_data(data_path: str):
"""Load all parquet files from the data folder."""
import glob
files = glob.glob(os.path.join(data_path, "*.parquet"))
dfs = [pd.read_parquet(f) for f in files]
df = pd.concat(dfs, ignore_index=True)
print(f"Loaded {len(df):,} rows from {len(files)} files")
return df


def train_model(args):
# MLflow auto-logging for sklearn/XGBoost
mlflow.xgboost.autolog(log_models=False) # We'll log the model manually

with mlflow.start_run() as run:
# Log hyperparameters explicitly
mlflow.log_params({
"max_depth": args.max_depth,
"n_estimators": args.n_estimators,
"learning_rate": args.learning_rate,
})

df = load_data(args.training_data)

feature_cols = [c for c in df.columns
if c not in ["customer_id", "churned", "split"]]
X = df[feature_cols]
y = df["churned"]

X_train, X_val, y_train, y_val = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)

model = xgb.XGBClassifier(
max_depth=args.max_depth,
n_estimators=args.n_estimators,
learning_rate=args.learning_rate,
use_label_encoder=False,
eval_metric="logloss",
random_state=42,
)
model.fit(
X_train, y_train,
eval_set=[(X_val, y_val)],
early_stopping_rounds=20,
verbose=100,
)

# Compute all metrics
val_probs = model.predict_proba(X_val)[:, 1]
val_preds = (val_probs >= 0.5).astype(int)

val_auc = roc_auc_score(y_val, val_probs)
val_ap = average_precision_score(y_val, val_probs)
val_f1 = f1_score(y_val, val_preds)

mlflow.log_metrics({
"val_auc": val_auc,
"val_average_precision": val_ap,
"val_f1": val_f1,
"best_iteration": model.best_iteration,
"n_features": len(feature_cols),
})

print(f"Validation AUC: {val_auc:.4f}")
print(f"Validation AP: {val_ap:.4f}")
print(f"Validation F1: {val_f1:.4f}")

# Quality gate
if val_auc < args.min_auc_threshold:
raise ValueError(
f"Model AUC {val_auc:.4f} < threshold {args.min_auc_threshold}. "
f"Aborting pipeline."
)

# Save model in MLflow format (enables Azure ML managed endpoints)
os.makedirs(args.mlflow_model, exist_ok=True)
mlflow.xgboost.save_model(model, args.mlflow_model)

# Also save in joblib format for custom serving
os.makedirs(args.model_output, exist_ok=True)
joblib.dump(model, os.path.join(args.model_output, "model.joblib"))

# Save feature names for the serving container
with open(os.path.join(args.model_output, "feature_names.json"), "w") as f:
json.dump(feature_cols, f)

print(f"MLflow run ID: {run.info.run_id}")
return val_auc


if __name__ == "__main__":
args = parse_args()
train_model(args)

Pipeline Definition

# pipeline.py
from azure.ai.ml import MLClient, Input, Output, load_component
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
DefaultAzureCredential(),
subscription_id="12345678-1234-1234-1234-123456789012",
resource_group_name="ml-platform-rg",
workspace_name="ml-platform-workspace",
)

# Load component specs
preprocess = load_component(source="components/preprocess/component.yaml")
train = load_component(source="components/train/component.yaml")
register = load_component(source="components/register/component.yaml")

@pipeline(
name="churn_training_pipeline",
description="End-to-end churn model training with quality gate",
experiment_name="churn-xgboost-experiments",
tags={"team": "ml-platform", "project": "churn"},
)
def churn_pipeline(
raw_data: Input(type=AssetTypes.URI_FOLDER),
max_depth: int = 6,
n_estimators: int = 200,
learning_rate: float = 0.05,
min_auc_threshold: float = 0.75,
):
# Step 1: Preprocess
preprocess_step = preprocess(
raw_data=raw_data,
test_fraction=0.2,
)
preprocess_step.compute = "cpu-cluster-ds3"

# Step 2: Train
train_step = train(
training_data=preprocess_step.outputs.processed_data,
max_depth=max_depth,
n_estimators=n_estimators,
learning_rate=learning_rate,
min_auc_threshold=min_auc_threshold,
)
train_step.compute = "cpu-cluster-ds3"
train_step.resources.instance_count = 1

# Step 3: Register model
register_step = register(
mlflow_model=train_step.outputs.mlflow_model,
model_name="churn-xgboost",
model_version="latest",
)
register_step.compute = "cpu-cluster-ds3"

return {"mlflow_model": train_step.outputs.mlflow_model}


def submit_pipeline(raw_data_version: str = "latest"):
# Get the versioned data asset
data_asset = ml_client.data.get("churn_training_data", version=raw_data_version)

pipeline_job = churn_pipeline(
raw_data=Input(path=data_asset.id, type=AssetTypes.URI_FOLDER),
max_depth=6,
n_estimators=300,
learning_rate=0.04,
min_auc_threshold=0.78,
)

pipeline_job.settings.default_compute = "cpu-cluster-ds3"
pipeline_job.settings.default_datastore = "training_data_store"

submitted_job = ml_client.jobs.create_or_update(pipeline_job)
print(f"Pipeline submitted: {submitted_job.name}")
print(f"Studio URL: {submitted_job.studio_url}")

# Wait for completion
ml_client.jobs.stream(submitted_job.name)
return submitted_job

MLflow Integration

Azure ML's MLflow integration is first-class. Every Azure ML workspace exposes an MLflow tracking server URI. You can use mlflow directly in your scripts - no Azure-specific MLflow client needed.

import mlflow
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

def configure_mlflow_for_azure(
subscription_id: str,
resource_group: str,
workspace_name: str,
):
ml_client = MLClient(
DefaultAzureCredential(),
subscription_id=subscription_id,
resource_group_name=resource_group,
workspace_name=workspace_name,
)

# Get the MLflow tracking URI for this workspace
tracking_uri = ml_client.workspaces.get(workspace_name).mlflow_tracking_uri
mlflow.set_tracking_uri(tracking_uri)
print(f"MLflow tracking URI: {tracking_uri}")


def run_mlflow_experiment(experiment_name: str):
"""Run a standard MLflow experiment against Azure ML backend."""
mlflow.set_experiment(experiment_name)

with mlflow.start_run(run_name="xgboost-baseline"):
# Log parameters
mlflow.log_param("model_type", "xgboost")
mlflow.log_param("max_depth", 6)

# Simulate training...
val_auc = 0.83

# Log metrics
mlflow.log_metric("val_auc", val_auc)

# Log model with MLflow schema (enables Azure ML model registration)
import xgboost as xgb
import numpy as np
model = xgb.XGBClassifier()
model.fit(np.random.rand(100, 5), np.random.randint(0, 2, 100))

mlflow.xgboost.log_model(
model,
artifact_path="model",
registered_model_name="churn-xgboost", # Auto-register on log
)

print(f"Run logged to Azure ML MLflow server")


def query_experiment_results(experiment_name: str):
"""Query past runs using the MLflow client."""
experiment = mlflow.get_experiment_by_name(experiment_name)

runs = mlflow.search_runs(
experiment_ids=[experiment.experiment_id],
filter_string="metrics.val_auc > 0.75",
order_by=["metrics.val_auc DESC"],
max_results=10,
)

print(f"Top 10 runs by val_auc:\n{runs[['run_id', 'metrics.val_auc', 'params.max_depth']].to_string()}")
return runs

Model Registry and Approval Workflows

The Azure ML Model Registry stores model versions. For enterprise workflows, you can add tags that represent approval state and enforce promotion gates.

from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

def register_model_from_job(ml_client: MLClient, job_name: str, model_name: str):
"""Register a model from a completed pipeline job."""
job = ml_client.jobs.get(job_name)

# Get the MLflow model output URI from the job
model_path = f"azureml://jobs/{job_name}/outputs/mlflow_model"

model = Model(
path=model_path,
type=AssetTypes.MLFLOW_MODEL,
name=model_name,
description="Churn prediction XGBoost model",
tags={
"val_auc": "0.83",
"stage": "dev",
"approved_by": "",
"trained_on_data_version": "v3",
},
)

registered = ml_client.models.create_or_update(model)
print(f"Model registered: {registered.name} v{registered.version}")
return registered


def promote_model_to_staging(
ml_client: MLClient,
model_name: str,
model_version: str,
approver: str,
):
"""Tag model as staging-approved after human review."""
model = ml_client.models.get(model_name, version=model_version)

# Update tags to reflect approval
model.tags.update({
"stage": "staging",
"approved_by": approver,
"approved_at": pd.Timestamp.now().isoformat(),
})

ml_client.models.create_or_update(model)
print(f"Model {model_name}:{model_version} promoted to staging by {approver}")


def promote_model_to_production(
ml_client: MLClient,
model_name: str,
model_version: str,
approver: str,
risk_sign_off: str,
):
"""Final promotion gate - requires both engineering and risk approval."""
model = ml_client.models.get(model_name, version=model_version)

if model.tags.get("stage") != "staging":
raise ValueError("Model must be in staging before promoting to production")

model.tags.update({
"stage": "production",
"prod_approved_by": approver,
"risk_approved_by": risk_sign_off,
"prod_approved_at": pd.Timestamp.now().isoformat(),
})

ml_client.models.create_or_update(model)
print(f"Model {model_name}:{model_version} promoted to production")

Managed Online Endpoints

Managed Online Endpoints are Azure ML's serverless inference layer. You define an endpoint and deploy model versions to it with traffic splits.

from azure.ai.ml.entities import (
ManagedOnlineEndpoint,
ManagedOnlineDeployment,
OnlineRequestSettings,
ProbeSettings,
)

def create_online_endpoint(ml_client: MLClient, endpoint_name: str):
endpoint = ManagedOnlineEndpoint(
name=endpoint_name,
description="Churn prediction real-time endpoint",
auth_mode="key", # or "aml_token" for Azure AD auth
tags={"team": "ml-platform"},
)
ml_client.online_endpoints.begin_create_or_update(endpoint).result()
print(f"Endpoint created: {endpoint_name}")
return endpoint


def deploy_model_to_endpoint(
ml_client: MLClient,
endpoint_name: str,
deployment_name: str,
model_name: str,
model_version: str,
instance_type: str = "Standard_DS3_v2",
instance_count: int = 1,
):
model = ml_client.models.get(model_name, version=model_version)

deployment = ManagedOnlineDeployment(
name=deployment_name,
endpoint_name=endpoint_name,
model=model,
instance_type=instance_type,
instance_count=instance_count,
request_settings=OnlineRequestSettings(
request_timeout_ms=60000,
max_concurrent_requests_per_instance=10,
max_queue_wait_ms=2000,
),
liveness_probe=ProbeSettings(
failure_threshold=3,
success_threshold=1,
period=30,
initial_delay=10,
),
readiness_probe=ProbeSettings(
failure_threshold=3,
success_threshold=1,
period=10,
initial_delay=10,
),
)

ml_client.online_deployments.begin_create_or_update(deployment).result()
print(f"Deployment '{deployment_name}' created on endpoint '{endpoint_name}'")


def canary_traffic_split(
ml_client: MLClient,
endpoint_name: str,
new_deployment: str,
old_deployment: str,
canary_pct: int = 10,
):
"""Route 10% of traffic to the new deployment for canary testing."""
endpoint = ml_client.online_endpoints.get(endpoint_name)
endpoint.traffic = {
old_deployment: 100 - canary_pct,
new_deployment: canary_pct,
}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()
print(f"Traffic split: {old_deployment}={100-canary_pct}%, {new_deployment}={canary_pct}%")


def invoke_endpoint(ml_client: MLClient, endpoint_name: str, request_data: dict):
import json

response = ml_client.online_endpoints.invoke(
endpoint_name=endpoint_name,
request_file=None,
request_data=json.dumps(request_data),
)
return json.loads(response)

Batch Endpoints

Batch endpoints handle large-scale offline scoring. They run on compute clusters and are optimized for throughput over latency.

from azure.ai.ml.entities import (
BatchEndpoint,
BatchDeployment,
BatchRetrySettings,
)
from azure.ai.ml.constants import BatchDeploymentOutputAction

def create_batch_endpoint(ml_client: MLClient, endpoint_name: str):
endpoint = BatchEndpoint(
name=endpoint_name,
description="Churn prediction batch scoring endpoint",
)
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()


def deploy_batch_model(
ml_client: MLClient,
endpoint_name: str,
model_name: str,
model_version: str,
compute_name: str,
):
model = ml_client.models.get(model_name, version=model_version)

deployment = BatchDeployment(
name="churn-batch-default",
description="Batch scoring for monthly churn reports",
endpoint_name=endpoint_name,
model=model,
compute=compute_name,
instance_count=4, # Parallelism
max_concurrency_per_instance=2,
mini_batch_size=100, # Records per mini-batch
output_action=BatchDeploymentOutputAction.APPEND_ROW,
output_file_name="predictions.csv",
retry_settings=BatchRetrySettings(
max_retries=3,
timeout=300,
),
logging_level="info",
)

ml_client.batch_deployments.begin_create_or_update(deployment).result()

Responsible AI Dashboard

Azure ML includes a Responsible AI (RAI) dashboard that generates fairness analysis, error analysis, counterfactual analysis, and causal analysis. For regulated industries, this is a compliance artifact.

from azure.ai.ml import load_component

def build_rai_pipeline(ml_client: MLClient, model_id: str, test_data_id: str):
"""Build an Azure ML pipeline that generates the RAI dashboard."""

# Load built-in RAI components from Azure ML registry
registry_name = "azureml"
rai_constructor = load_component(
client=ml_client,
name="microsoft_azureml_rai_tabular_insight_constructor",
version="0.0.16",
registry_name=registry_name,
)
rai_error_analysis = load_component(
client=ml_client,
name="microsoft_azureml_rai_tabular_erroranalysis",
version="0.0.16",
registry_name=registry_name,
)
rai_explanations = load_component(
client=ml_client,
name="microsoft_azureml_rai_tabular_explanation",
version="0.0.16",
registry_name=registry_name,
)
rai_gather = load_component(
client=ml_client,
name="microsoft_azureml_rai_tabular_insight_gather",
version="0.0.16",
registry_name=registry_name,
)

@pipeline(name="rai_dashboard_pipeline")
def rai_pipeline(
target_column_name: str,
train_data: Input(type=AssetTypes.URI_FOLDER),
test_data: Input(type=AssetTypes.URI_FOLDER),
):
# Step 1: RAI constructor - connects model and data
rai_constructor_step = rai_constructor(
title="Churn Model RAI Dashboard",
task_type="classification",
model_info=model_id,
model_input=Input(type=AssetTypes.MLFLOW_MODEL, path=model_id),
train_dataset=train_data,
test_dataset=test_data,
target_column_name=target_column_name,
)

# Step 2: Error analysis - find where the model fails
error_step = rai_error_analysis(
rai_insights_dashboard=rai_constructor_step.outputs.rai_insights_dashboard,
)

# Step 3: Explanations - SHAP feature importance
explanation_step = rai_explanations(
rai_insights_dashboard=rai_constructor_step.outputs.rai_insights_dashboard,
)

# Step 4: Gather into dashboard
rai_gather_step = rai_gather(
constructor=rai_constructor_step.outputs.rai_insights_dashboard,
insight_1=error_step.outputs.error_analysis,
insight_2=explanation_step.outputs.explanation,
)

return {"rai_dashboard": rai_gather_step.outputs.dashboard}

return rai_pipeline

CLI v2 Workflow

The Azure ML CLI v2 is the preferred interface for CI/CD pipelines. Jobs are defined in YAML and submitted with a single command.

# jobs/train-job.yaml
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

display_name: churn-training-pipeline
experiment_name: churn-xgboost-experiments

settings:
default_compute: cpu-cluster-ds3
default_datastore: training_data_store

inputs:
raw_data:
type: uri_folder
path: azureml:churn_training_data:3
max_depth: 6
n_estimators: 300
learning_rate: 0.04
min_auc_threshold: 0.78

jobs:
preprocess_step:
type: command
component: file:./components/preprocess/component.yaml
inputs:
raw_data: ${{parent.inputs.raw_data}}
test_fraction: 0.2

train_step:
type: command
component: file:./components/train/component.yaml
inputs:
training_data: ${{parent.jobs.preprocess_step.outputs.processed_data}}
max_depth: ${{parent.inputs.max_depth}}
n_estimators: ${{parent.inputs.n_estimators}}
learning_rate: ${{parent.inputs.learning_rate}}
min_auc_threshold: ${{parent.inputs.min_auc_threshold}}
# Submit from CI/CD pipeline
az ml job create --file jobs/train-job.yaml \
--subscription 12345678-1234-1234-1234-123456789012 \
--resource-group ml-platform-rg \
--workspace-name ml-platform-workspace

# Monitor job
az ml job show --name <job-name> --query status

# Stream logs
az ml job stream --name <job-name>

Production Engineering Notes

Workspace Isolation Strategy

In enterprise environments, separate workspaces by environment:

ml-platform-dev-workspace # Data scientists experiment here
ml-platform-staging-workspace # Pre-production validation
ml-platform-prod-workspace # Production models only

Use Azure DevOps or GitHub Actions to promote models between workspaces via a CI/CD pipeline with manual approval gates.

Managed Identity for Secure Access

Never store credentials in code or environment variables. Configure a Managed Identity on the workspace and grant it access to Key Vault, Storage, and other resources through RBAC.

# In training scripts, use DefaultAzureCredential
# In AML workspaces, this resolves to the Workspace Managed Identity
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

def get_database_password(keyvault_url: str) -> str:
credential = DefaultAzureCredential()
client = SecretClient(vault_url=keyvault_url, credential=credential)
return client.get_secret("database-password").value

Monitoring Endpoints

def check_endpoint_health(ml_client: MLClient, endpoint_name: str):
endpoint = ml_client.online_endpoints.get(endpoint_name)
deployments = ml_client.online_deployments.list(endpoint_name=endpoint_name)

for deployment in deployments:
print(f"Deployment: {deployment.name}")
print(f" Instance type: {deployment.instance_type}")
print(f" Instance count: {deployment.instance_count}")
print(f" Traffic: {endpoint.traffic.get(deployment.name, 0)}%")
print(f" Provisioning state: {deployment.provisioning_state}")

# Get endpoint keys for client authentication
keys = ml_client.online_endpoints.get_keys(endpoint_name)
print(f"Primary key (first 8 chars): {keys.primary_key[:8]}...")

Common Mistakes

:::danger Using v1 SDK Examples with v2 SDK Azure ML has two major SDK versions (v1 and v2) with very different APIs. Most StackOverflow answers and older tutorials use v1 (azureml-sdk, azureml.core.*). The v2 SDK (azure-ai-ml) has a completely different import structure and API design. Mixing them in the same project causes confusing import errors. Commit to v2 exclusively for new projects.

# v1 - OLD, do not use for new projects
from azureml.core import Workspace, Experiment, Run

# v2 - current
from azure.ai.ml import MLClient, Input, Output
from azure.ai.ml.entities import Model, Environment

:::

:::danger Storing Training Data in Git A common mistake with Azure ML component pipelines is committing training data into the components/src/ directory alongside the training script. Data ends up in Git history, bloating the repository and creating compliance issues. Always reference data via Data Assets registered in the Azure ML workspace. Components receive data paths as inputs, never as embedded files. :::

:::warning Compute Cluster Not Scaling Down Compute clusters have an idle_time_before_scale_down setting. If this is set too high (or not set), clusters idle with instances running and billing continues. Set idle_time_before_scale_down=120 for training clusters. For GPU clusters that take a long time to start, balance scale-down time against cold-start latency - 300 seconds is a reasonable compromise. :::

:::warning Environment Build Time Azure ML builds Docker images from your Environment spec the first time they are used. A fresh build with pip dependencies takes 5-15 minutes. If every pipeline run triggers a rebuild (because you changed conda.yaml), you add significant latency. Version your environments explicitly and only change the version number when dependencies actually change. Increment version for dependency changes; do not use :latest. :::


Interview Q&A

Q1: What is the difference between Azure ML Components and traditional Python functions in a pipeline?

Answer: A Component is a self-contained, reusable pipeline step with a defined interface (typed inputs and outputs), its own execution environment (Docker image), and its own resource requirements (compute type, instance count). A Python function is just code. The critical difference is isolation: components run in separate containers with independent environments, which means a preprocessing step using pandas 2.0 and a training step using TensorFlow 2.12 can coexist in the same pipeline without dependency conflicts. Components are also versioned and shareable across pipelines and teams - you register them once in the workspace and reference them by name and version. This is the Azure ML answer to the "reproducibility across runs" problem.

Q2: How does Azure ML's MLflow integration work, and what can you do with MLflow that you cannot do natively?

Answer: Azure ML exposes an MLflow-compatible tracking server. When you mlflow.set_tracking_uri(workspace_mlflow_uri), all standard MLflow calls (log_param, log_metric, log_model) write to Azure ML Experiments. This means you can use standard MLflow code without Azure-specific imports, then view results in Azure ML Studio. The key advantage over native Azure ML tracking is portability: the same MLflow code works on Databricks, AWS SageMaker, a local MLflow server, or Azure ML - just change the tracking URI. MLflow also provides richer model flavors (scikit-learn, PyTorch, TensorFlow, XGBoost, etc.) with built-in serialization and input/output schemas via mlflow.models.signature. Azure ML can deploy MLflow models directly to Managed Online Endpoints without writing custom scoring scripts.

Q3: Explain the Azure ML approval workflow for moving models to production in a regulated environment.

Answer: Azure ML does not have a built-in human-approval workflow (unlike SageMaker's Model Registry approval states). The enterprise pattern is to implement approval gates using model tags and CI/CD pipeline controls. The workflow: (1) Automated pipeline trains and registers model with stage=dev tag. (2) CI/CD pipeline (Azure DevOps or GitHub Actions) runs automated validation tests. (3) Pipeline creates an Azure DevOps approval gate - a human (ML engineer or risk officer) reviews the RAI dashboard, model card, and metrics in Azure ML Studio and approves or rejects. (4) On approval, a script updates the model's stage tag to staging and deploys to the staging endpoint. (5) A second approval gate (typically risk/compliance) gates the staging to production promotion. This pattern requires no third-party tooling and integrates naturally with Azure DevOps pipelines.

Q4: What is the difference between Managed Online Endpoints and a custom AKS deployment for model serving?

Answer: Managed Online Endpoints abstract away infrastructure management entirely - you specify the instance type, min/max instances, and Azure handles scaling, health checking, load balancing, and TLS termination. Custom AKS deployments give you full control: you can configure the ingress, run sidecar containers, use custom autoscaling policies, co-locate the model server with other microservices, and choose exactly which Kubernetes version to run. Managed endpoints are better for most ML teams because the operational overhead is zero and they integrate natively with Azure ML model management. AKS is better when you need to integrate the serving layer with existing Kubernetes infrastructure, need advanced networking (VNet injection, custom ingress controllers), or are serving models as part of a multi-container application.

Q5: How do you handle the "works on my compute instance, fails on compute cluster" problem in Azure ML?

Answer: This problem almost always has one of three causes: (1) dependency version mismatch - the compute instance has packages installed globally that are not in the Environment spec; (2) data path differences - the compute instance reads local files while the cluster reads from mounted datastores; (3) environment variable differences - code relies on environment variables set in the notebook that are not passed to the cluster job. The fix: always develop with the Environment pinned and never install packages globally on the compute instance. Test your training script locally against the registered datastore path before submitting to the cluster. Use az ml job create --dry-run to validate the job spec. Add explicit logging of the Python version, key package versions, and environment variables at the start of every training script - this makes debugging much faster when things differ between environments.

Q6: When would you use Batch Endpoints instead of Online Endpoints?

Answer: Use Batch Endpoints when (1) the use case is scheduled offline scoring - e.g., score all customers at midnight for tomorrow's marketing campaign; (2) latency requirements are loose - minutes to hours rather than milliseconds; (3) the input data is large - thousands to millions of records; (4) you want to minimize cost - batch endpoints run on compute clusters that scale to zero, whereas online endpoints require always-on instances. Online Endpoints are for real-time use cases where a user or system is waiting for a prediction synchronously. The cost difference is substantial: an online endpoint running 24/7 on Standard_DS3_v2 costs ~$150/month; a batch endpoint on the same instance type costs zero when idle and a few cents per batch run.

© 2026 EngineersOfAI. All rights reserved.