What is aws sagemaker?

Master the complete AWS SageMaker ecosystem for end-to-end ML workflows - training jobs, pipelines, model registry, feature store, and production inference at scale.

How does sagemaker pipelines work in practice?

AWS SageMaker for MLOps covers aws sagemaker, sagemaker pipelines, sagemaker training from first principles with code examples. Free lesson at https://engineersofai.com/docs/mlops/cloud-platforms/aws-sagemaker

What is the difference between aws sagemaker and sagemaker training?

See the full breakdown at https://engineersofai.com/docs/mlops/cloud-platforms/aws-sagemaker

AWS SageMaker for MLOps

The Eight-Week Platform Migration

It is week three of eight. Your team has been handed a mandate from the VP of Engineering: migrate the company's entire ML infrastructure from on-premises GPU servers to AWS by end of quarter. The on-prem setup is a collection of bare-metal servers, a shared NFS mount for data, and a GitLab runner that SSH-es into whichever GPU is free. It works, in the way that a hand-cranked car works - eventually, if nothing breaks.

You have trained models that power the product's recommendation engine, a fraud detection classifier, and an internal document classifier used by operations. Each one was built by a different engineer, trained in a slightly different way, deployed in a slightly different way, and monitored not at all. Three ML engineers, three slightly different workflows, zero reproducibility. The VP is right that this needs to change. The question is what to change it to.

Your AWS account has been provisioned. The credits are real. SageMaker is open in a browser tab. And you are staring at the service console wondering whether to start with SageMaker Studio, SageMaker Training, or SageMaker Pipelines. The documentation is vast. The tutorials use toy datasets. None of them answer the question you actually have: how do I take the three models we have in production today and rebuild the entire workflow so it is repeatable, auditable, and doesn't require SSH access to a server named ml-gpu-04?

This lesson answers that question systematically. We cover the SageMaker ecosystem from the perspective of an MLOps engineer, not a data scientist - what each service does, how the pieces connect, and how to build a production ML platform in eight weeks.

:::tip 🎮 Interactive Playground Visualize this concept: Try the Cloud ML Platforms Compared demo on the EngineersOfAI Playground - no code required. :::

Why This Exists - The Problem with DIY ML Infrastructure

Before SageMaker, running ML at scale on AWS required assembling a large number of moving parts yourself. You needed EC2 instances for training, with instance-specific AMIs and CUDA drivers. You needed S3 buckets organized in a coherent way. You needed a job scheduler (maybe ECS, maybe Kubernetes, maybe cron). You needed a way to pass hyperparameters, capture logs, and save model artifacts. You needed a serving layer - maybe a Flask app in a container on ECS. You needed a way to version models and know which artifact corresponds to which code.

None of these are impossible to build. All of them have been built, by every ML team, approximately the same way, with approximately the same mistakes. SageMaker exists because AWS recognized that every ML team was solving the same infrastructure problems, and decided to productize the solutions.

The value proposition is not that SageMaker's solutions are always better than what you'd build - in some cases they are not. The value proposition is that they are already built, already integrated, already documented, and already maintained by a team at AWS whose full-time job is that infrastructure.

:::note Historical Context SageMaker launched in November 2017 at AWS re:Invent. The original launch focused on training and inference. SageMaker Pipelines launched at re:Invent 2020. The Feature Store launched at re:Invent 2020. The platform has expanded dramatically - understanding the history helps you understand why some components feel more mature than others. :::

The SageMaker Ecosystem

SageMaker is not a single product. It is a platform of about 30 interconnected services. For MLOps purposes, the ones that matter are:

SageMaker Training Jobs

A SageMaker Training Job is a managed compute job. You specify:

A Docker container with your training code
An instance type (e.g., ml.p3.2xlarge for single GPU)
An S3 path for input data
An S3 path for output artifacts
Hyperparameters as a dict

SageMaker provisions the instance, copies the data from S3, runs your container, copies the output artifact back to S3, and terminates the instance. You pay only for the time the instance runs.

Training Script Pattern

# train.py - runs inside the SageMaker container
import argparse
import os
import joblib
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score

def train(args):
    # SageMaker passes data via environment variables
    train_path = os.path.join(args.train, "train.csv")
    val_path = os.path.join(args.validation, "validation.csv")

    train_df = pd.read_csv(train_path)
    val_df = pd.read_csv(val_path)

    X_train = train_df.drop("label", axis=1)
    y_train = train_df["label"]
    X_val = val_df.drop("label", axis=1)
    y_val = val_df["label"]

    model = GradientBoostingClassifier(
        n_estimators=args.n_estimators,
        max_depth=args.max_depth,
        learning_rate=args.learning_rate,
    )
    model.fit(X_train, y_train)

    val_auc = roc_auc_score(y_val, model.predict_proba(X_val)[:, 1])
    print(f"Validation AUC: {val_auc:.4f}")

    # SageMaker expects model artifact in /opt/ml/model
    model_dir = args.model_dir
    os.makedirs(model_dir, exist_ok=True)
    joblib.dump(model, os.path.join(model_dir, "model.joblib"))

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    # SageMaker hyperparameters
    parser.add_argument("--n-estimators", type=int, default=100)
    parser.add_argument("--max-depth", type=int, default=3)
    parser.add_argument("--learning-rate", type=float, default=0.1)
    # SageMaker channel paths (set by SageMaker from input config)
    parser.add_argument("--train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
    parser.add_argument("--validation", type=str, default=os.environ.get("SM_CHANNEL_VALIDATION"))
    parser.add_argument("--model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
    args = parser.parse_args()
    train(args)

Launching a Training Job via Python SDK

import sagemaker
from sagemaker.sklearn import SKLearn

session = sagemaker.Session()
role = "arn:aws:iam::123456789:role/SageMakerExecutionRole"

estimator = SKLearn(
    entry_point="train.py",
    source_dir="./src",
    framework_version="1.2-1",
    py_version="py3",
    instance_type="ml.m5.xlarge",
    instance_count=1,
    role=role,
    hyperparameters={
        "n-estimators": 200,
        "max-depth": 5,
        "learning-rate": 0.05,
    },
    # Spot instance training - up to 90% cost reduction
    use_spot_instances=True,
    max_wait=7200,   # seconds - includes wait time for spot
    max_run=3600,    # max training time
    checkpoint_s3_uri="s3://my-bucket/checkpoints/fraud-model/",
)

estimator.fit(
    inputs={
        "train": "s3://my-bucket/data/train/",
        "validation": "s3://my-bucket/data/validation/",
    },
    job_name=f"fraud-model-{int(time.time())}",
)

:::tip Spot Instance Training Using use_spot_instances=True on SageMaker Training can reduce compute costs by 60–90%. SageMaker handles interruptions automatically and resumes from the last checkpoint. The requirement: your training script must save checkpoints to checkpoint_s3_uri. For training jobs over 30 minutes, spot instances are almost always worth using. :::

SageMaker Pipelines

SageMaker Pipelines is a workflow orchestrator built specifically for ML. It lets you define a DAG of pipeline steps - data processing, training, evaluation, model registration - that run in sequence or parallel and are tracked end-to-end.

Why Pipelines Instead of Just Scripts?

Running python train.py manually works once. The second time, you don't remember exactly what you ran. The third time, a new engineer joins and runs something slightly different. By month six, nobody knows which model artifact was trained on which data with which code version. Pipelines solve this: each run is an immutable, auditable record.

Pipeline Definition

import boto3
import sagemaker
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep, TrainingStep, TransformStep
from sagemaker.workflow.model_step import ModelStep
from sagemaker.workflow.parameters import ParameterString, ParameterFloat
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.sklearn import SKLearn
from sagemaker.model_metrics import MetricsSource, ModelMetrics
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet

session = sagemaker.Session()
role = "arn:aws:iam::123456789:role/SageMakerExecutionRole"

# Pipeline parameters - can be overridden at execution time
input_data_uri = ParameterString(
    name="InputDataUri",
    default_value="s3://my-bucket/data/raw/",
)
auc_threshold = ParameterFloat(
    name="AUCThreshold",
    default_value=0.80,
)

# Step 1: Data Processing
sklearn_processor = SKLearnProcessor(
    framework_version="1.2-1",
    instance_type="ml.m5.xlarge",
    instance_count=1,
    role=role,
)

processing_step = ProcessingStep(
    name="PreprocessData",
    processor=sklearn_processor,
    inputs=[
        sagemaker.processing.ProcessingInput(
            source=input_data_uri,
            destination="/opt/ml/processing/input",
        )
    ],
    outputs=[
        sagemaker.processing.ProcessingOutput(
            output_name="train",
            source="/opt/ml/processing/output/train",
        ),
        sagemaker.processing.ProcessingOutput(
            output_name="validation",
            source="/opt/ml/processing/output/validation",
        ),
    ],
    code="preprocessing.py",
)

# Step 2: Training
estimator = SKLearn(
    entry_point="train.py",
    framework_version="1.2-1",
    instance_type="ml.m5.xlarge",
    role=role,
    hyperparameters={"n-estimators": 200, "max-depth": 5},
    use_spot_instances=True,
    max_wait=7200,
    max_run=3600,
)

training_step = TrainingStep(
    name="TrainFraudModel",
    estimator=estimator,
    inputs={
        "train": sagemaker.inputs.TrainingInput(
            s3_data=processing_step.properties.ProcessingOutputConfig
                .Outputs["train"].S3Output.S3Uri,
            content_type="text/csv",
        ),
        "validation": sagemaker.inputs.TrainingInput(
            s3_data=processing_step.properties.ProcessingOutputConfig
                .Outputs["validation"].S3Output.S3Uri,
            content_type="text/csv",
        ),
    },
)

# Step 3: Conditional registration - only register if AUC > threshold
# (evaluation step writes metrics.json; ConditionStep reads it)
condition = ConditionGreaterThanOrEqualTo(
    left=JsonGet(
        step_name="EvaluateModel",
        property_file="evaluation",
        json_path="classification_metrics.auc.value",
    ),
    right=auc_threshold,
)

condition_step = ConditionStep(
    name="CheckAUCThreshold",
    conditions=[condition],
    if_steps=[register_step],   # register_step defined separately
    else_steps=[],              # do nothing if below threshold
)

pipeline = Pipeline(
    name="FraudDetectionPipeline",
    parameters=[input_data_uri, auc_threshold],
    steps=[processing_step, training_step, evaluate_step, condition_step],
    sagemaker_session=session,
)

# Create or update pipeline definition in AWS
pipeline.upsert(role_arn=role)

# Execute pipeline
execution = pipeline.start(
    parameters={
        "InputDataUri": "s3://my-bucket/data/raw/2024-01/",
        "AUCThreshold": 0.82,
    }
)
execution.wait()

SageMaker Model Registry

The Model Registry is a centralized catalog of model versions. Each version has:

A reference to the model artifact in S3
The container image used for inference
Evaluation metrics
An approval status: PendingManualApproval, Approved, or Rejected

Only Approved models can be deployed to production endpoints. This gives you a mandatory gate between training and deployment.

from sagemaker.model import Model
from sagemaker.workflow.model_step import ModelStep

# Register model to registry
model = Model(
    image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/sklearn-inference:1.2-1-cpu-py3",
    model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
    role=role,
    sagemaker_session=session,
)

register_step = ModelStep(
    name="RegisterFraudModel",
    step_args=model.register(
        content_types=["text/csv"],
        response_types=["application/json"],
        inference_instances=["ml.m5.large", "ml.m5.xlarge"],
        transform_instances=["ml.m5.xlarge"],
        model_package_group_name="FraudDetectionModels",
        approval_status="PendingManualApproval",
        model_metrics=ModelMetrics(
            model_statistics=MetricsSource(
                s3_uri=evaluation_step.arguments["ProcessingOutputConfig"]
                    ["Outputs"][0]["S3Output"]["S3Uri"] + "/metrics.json",
                content_type="application/json",
            )
        ),
    ),
)

Approving a Model

import boto3

sm_client = boto3.client("sagemaker")

# List model versions pending approval
response = sm_client.list_model_packages(
    ModelPackageGroupName="FraudDetectionModels",
    ModelApprovalStatus="PendingManualApproval",
    SortBy="CreationTime",
    SortOrder="Descending",
)

latest_package_arn = response["ModelPackageSummaryList"][0]["ModelPackageArn"]

# Approve the model (could also be done via console or CI/CD)
sm_client.update_model_package(
    ModelPackageName=latest_package_arn,
    ModelApprovalStatus="Approved",
)

SageMaker Endpoints - Real-Time Inference

A SageMaker Endpoint is a managed REST API for real-time ML inference. You specify a model, a container image, and an instance type. SageMaker handles load balancing, autoscaling, and health checks.

from sagemaker.model import Model

# Deploy model to real-time endpoint
predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large",
    endpoint_name="fraud-detection-prod",
    # Auto-scaling via Application Auto Scaling (configure separately)
)

# Invoke endpoint
import json
import boto3

runtime = boto3.client("sagemaker-runtime")

response = runtime.invoke_endpoint(
    EndpointName="fraud-detection-prod",
    ContentType="text/csv",
    Body="0.5,1.2,3.4,0.0,1.0",  # feature vector as CSV
)

result = json.loads(response["Body"].read())
print(f"Fraud probability: {result}")

Endpoint Auto-scaling

# Configure autoscaling for the endpoint variant
client = boto3.client("application-autoscaling")

client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId="endpoint/fraud-detection-prod/variant/AllTraffic",
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=1,
    MaxCapacity=10,
)

# Scale based on invocations per instance
client.put_scaling_policy(
    PolicyName="fraud-endpoint-scaling",
    ServiceNamespace="sagemaker",
    ResourceId="endpoint/fraud-detection-prod/variant/AllTraffic",
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    PolicyType="TargetTrackingScaling",
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 1000.0,  # target invocations per instance per minute
        "PredefinedMetricSpecification": {
            "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance",
        },
        "ScaleInCooldown": 300,
        "ScaleOutCooldown": 60,
    },
)

SageMaker Feature Store

The Feature Store solves the training-serving skew problem: features computed differently at training time vs serving time. The Feature Store has two stores:

Online Store: Low-latency key-value store (DynamoDB-backed). Used at inference time to retrieve the latest feature values for a given entity.
Offline Store: S3-backed columnar store. Used at training time to retrieve historical feature values.

Both stores are fed by the same ingestion pipeline, guaranteeing consistency.

from sagemaker.feature_store.feature_group import FeatureGroup
import pandas as pd
import time

# Define feature group schema
feature_definitions = [
    {"FeatureName": "user_id", "FeatureType": "Integral"},
    {"FeatureName": "transaction_count_7d", "FeatureType": "Fractional"},
    {"FeatureName": "avg_transaction_amount_30d", "FeatureType": "Fractional"},
    {"FeatureName": "distinct_merchants_14d", "FeatureType": "Integral"},
    {"FeatureName": "event_time", "FeatureType": "String"},
]

feature_group = FeatureGroup(
    name="user-transaction-features",
    sagemaker_session=session,
)
feature_group.load_feature_definitions(data_frame=features_df)

# Create feature group with both online and offline stores enabled
feature_group.create(
    s3_uri=f"s3://my-bucket/feature-store/",
    record_identifier_name="user_id",
    event_time_feature_name="event_time",
    role_arn=role,
    enable_online_store=True,
)

# Ingest features
feature_group.ingest(
    data_frame=features_df,
    max_workers=4,
    wait=True,
)

# At inference time: retrieve latest features from online store
record = feature_group.get_record(
    record_identifier_value_as_string="user_42",
)

S3 as ML Data Lake

S3 is the foundation of any AWS ML platform. Establish a clear naming convention from day one:

s3://company-ml-data/
├── raw/
│   ├── transactions/year=2024/month=01/day=15/
│   └── user-events/year=2024/month=01/day=15/
├── processed/
│   ├── features/v1/train/
│   └── features/v1/validation/
├── models/
│   └── fraud-detection/
│       ├── model-20240115-143022/model.tar.gz
│       └── model-20240116-092345/model.tar.gz
└── experiments/
    └── fraud-detection/
        └── run-20240115-143022/
            ├── metrics.json
            └── evaluation_report.html

Use S3 lifecycle policies to automatically transition old model artifacts to S3-IA or Glacier:

import boto3

s3 = boto3.client("s3")

s3.put_bucket_lifecycle_configuration(
    Bucket="company-ml-data",
    LifecycleConfiguration={
        "Rules": [
            {
                "ID": "ArchiveOldModelArtifacts",
                "Filter": {"Prefix": "models/"},
                "Status": "Enabled",
                "Transitions": [
                    {"Days": 90, "StorageClass": "STANDARD_IA"},
                    {"Days": 365, "StorageClass": "GLACIER"},
                ],
            }
        ]
    },
)

ECR for ML Container Images

Every SageMaker training job and endpoint uses a Docker container. Your custom containers live in Amazon ECR (Elastic Container Registry).

# Dockerfile for custom training container
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.1.0-gpu-py310-cu121-ubuntu20.04-sagemaker

# Install additional dependencies
COPY requirements.txt /opt/ml/code/requirements.txt
RUN pip install -r /opt/ml/code/requirements.txt

# Copy training code
COPY src/ /opt/ml/code/

ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code
ENV SAGEMAKER_PROGRAM train.py

WORKDIR /opt/ml/code

# Build and push to ECR
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
REGION=us-east-1
ECR_URI=$ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com

# Login
aws ecr get-login-password --region $REGION | \
  docker login --username AWS --password-stdin $ECR_URI

# Create repository (once)
aws ecr create-repository --repository-name fraud-model-training

# Build and push
docker build -t fraud-model-training:v1.2 .
docker tag fraud-model-training:v1.2 $ECR_URI/fraud-model-training:v1.2
docker push $ECR_URI/fraud-model-training:v1.2

SageMaker Pipelines vs Apache Airflow on AWS

Both tools can orchestrate ML workflows. When do you use which?

Dimension	SageMaker Pipelines	Airflow on MWAA
ML integration	Native (training, registry, monitoring)	Via SageMaker operators
Non-ML tasks	Limited	Excellent (DBT, Redshift, Lambda, etc.)
Experiment tracking	Built-in	Requires separate integration
Cost	Included in SageMaker	Managed Airflow has fixed cost (~$300+/month)
Learning curve	SageMaker SDK	Airflow DAGs + Python
Best for	ML-only pipelines	Mixed data + ML workflows

Recommendation: Use SageMaker Pipelines when your workflow is primarily ML steps (process → train → evaluate → register → deploy). Use Airflow when your ML pipeline is one component in a larger data workflow.

Production Engineering Notes

IAM Roles - Minimum Required Permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::company-ml-data",
        "arn:aws:s3:::company-ml-data/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "ecr:GetAuthorizationToken"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateTrainingJob",
        "sagemaker:CreateModel",
        "sagemaker:CreateEndpointConfig",
        "sagemaker:CreateEndpoint",
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    }
  ]
}

Endpoint Blue-Green Deployment

SageMaker natively supports blue-green deployment for endpoint updates - shift traffic gradually from old variant to new:

sm_client = boto3.client("sagemaker")

# Update endpoint with new model, shifting 10% traffic initially
sm_client.update_endpoint(
    EndpointName="fraud-detection-prod",
    EndpointConfigName="fraud-detection-config-v2",
    DeploymentConfig={
        "BlueGreenUpdatePolicy": {
            "TrafficRoutingConfiguration": {
                "Type": "LINEAR",
                "LinearStepSize": {
                    "Type": "CAPACITY_PERCENT",
                    "Value": 10,
                },
                "WaitIntervalInSeconds": 300,  # wait 5 min between shifts
            },
            "TerminationWaitInSeconds": 300,
            "MaximumExecutionTimeoutInSeconds": 3600,
        },
        "AutoRollbackConfiguration": {
            "Alarms": [
                {"AlarmName": "fraud-endpoint-error-rate-high"},
            ]
        },
    },
)

Common Mistakes

:::danger Data Leakage via S3 Bucket Permissions A common mistake is giving the SageMaker execution role access to all S3 buckets ("Resource": "*"). If that role is compromised or misconfigured, it can read any S3 bucket in the account. Always scope S3 permissions to the specific bucket ARNs used for ML. :::

:::danger Not Using Spot Instances for Training Training jobs are the largest single cost driver in SageMaker. Not using spot instances for long training jobs wastes 60–90% of training budget. The only reason not to use spot is if your training script cannot be made interruptible with checkpointing - and that is almost always fixable. :::

:::warning Endpoint Staying Alive When Not Needed SageMaker endpoints are charged by the hour whether or not they receive traffic. Development endpoints left running overnight or over weekends add up fast. Implement scheduled scaling to scale down to 0 during off-hours, or use Serverless Inference for low-traffic endpoints. :::

:::warning Training-Serving Skew Without Feature Store Computing features differently in the training pipeline vs the inference lambda is one of the most common causes of unexplained model performance degradation in production. If you have features that require complex logic to compute, use the Feature Store so training and serving use the same computation. :::

Interview Q&A

Q1: What is the difference between SageMaker Training Jobs and SageMaker Processing Jobs?

Training Jobs are designed specifically for ML model training. They have first-class support for distributed training (with Horovod and SageMaker Distributed), automatic metric capture from stdout (CloudWatch integration), model artifact packaging, and spot instance interruption handling. Processing Jobs are general-purpose containers for data processing. They are typically used for feature engineering, data validation, and evaluation scripts. Key difference: Training Jobs expect your script to produce a model artifact in /opt/ml/model; Processing Jobs expect your script to produce data in /opt/ml/processing/output.

Q2: How does SageMaker handle spot instance interruptions during training?

When a spot instance is interrupted, SageMaker saves the state and retries the job automatically on a new instance. The mechanism requires checkpointing: you must periodically save your model state to the checkpoint_s3_uri. When the job resumes on a new instance, it loads from the latest checkpoint. SageMaker provides the SM_CHECKPOINT_DIR environment variable pointing to a local path that is synced to S3 automatically. For PyTorch, this means calling torch.save(model.state_dict(), checkpoint_path) at the end of each epoch.

Q3: What is the SageMaker Model Registry approval workflow, and why does it matter?

The Model Registry stores model versions with an approval_status field: PendingManualApproval, Approved, or Rejected. You configure your deployment automation (CDK, Terraform, or an EventBridge rule) to only deploy Approved models. This creates a mandatory human review gate between training and production deployment. An automated evaluation step can set the status to Approved if metrics pass thresholds, or the gate can require a human in the loop. This is a compliance requirement in regulated industries - having an audit trail of who approved which model version for production.

Q4: When would you use SageMaker Batch Transform vs a real-time endpoint?

Use Batch Transform when: (1) you have a fixed dataset to score (weekly scoring of all customers), (2) low latency is not required (results can be written to S3 and read later), (3) the dataset is too large to fit in memory for local scoring. Use real-time endpoints when: (1) you need sub-second response for user-facing features, (2) you need to score one record at a time in response to an event. Cost difference: Batch Transform instances run only while the job runs. Real-time endpoints run continuously and are more expensive at low request volumes.

Q5: How do you implement A/B testing with SageMaker endpoints?

SageMaker Endpoints support multiple production variants, each with a weight. You can route, say, 90% of traffic to variant A (the current model) and 10% to variant B (the challenger model). SageMaker logs which variant served each request, allowing you to compare metrics downstream:

sm_client.create_endpoint_config(
    EndpointConfigName="fraud-ab-test-config",
    ProductionVariants=[
        {
            "VariantName": "ModelA",
            "ModelName": "fraud-model-v1",
            "InstanceType": "ml.m5.large",
            "InitialInstanceCount": 1,
            "InitialVariantWeight": 0.9,
        },
        {
            "VariantName": "ModelB",
            "ModelName": "fraud-model-v2",
            "InstanceType": "ml.m5.large",
            "InitialInstanceCount": 1,
            "InitialVariantWeight": 0.1,
        },
    ],
)

Q6: How do you monitor model quality drift in production SageMaker endpoints?

SageMaker Model Monitor runs scheduled jobs that compare the current request/response distribution against a baseline captured at deployment time. Set up a baseline, then configure a monitoring schedule:

from sagemaker.model_monitor import DefaultModelMonitor, CronExpressionGenerator

monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

monitor.suggest_baseline(
    baseline_dataset="s3://my-bucket/data/baseline/train.csv",
    dataset_format={"csv": {"header": True}},
    output_s3_uri="s3://my-bucket/monitoring/baseline/",
    wait=True,
)

monitor.create_monitoring_schedule(
    monitor_schedule_name="fraud-model-monitor",
    endpoint_input="fraud-detection-prod",
    output_s3_uri="s3://my-bucket/monitoring/reports/",
    statistics=monitor.baseline_statistics(),
    constraints=monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
)

When drift is detected, SageMaker publishes CloudWatch metrics and can trigger alarms.

The Eight-Week Migration - What It Actually Looks Like

Week 1–2: S3 bucket structure, IAM roles, ECR repository setup. Containerize existing training scripts.

Week 3–4: SageMaker Training Jobs for all three models. Validate artifacts match on-prem outputs.

Week 5: SageMaker Pipelines for the most critical model (fraud detection). Include processing, training, evaluation, and conditional registration steps.

Week 6: Model Registry approval workflow. Real-time endpoints for all three models.

Week 7: Feature Store for the features that were computed differently at training and serving time. SageMaker Model Monitor baselines.

Week 8: Spot instance training enabled. Autoscaling configured on endpoints. Documentation written. On-prem servers decommissioned.

The eight-week timeline is tight but achievable. The most common mistake is trying to do everything perfectly instead of getting something running. Start with training jobs, not pipelines. Get one model through the full SageMaker lifecycle before porting the others. The platform will make more sense once you've run one end-to-end cycle.

The Eight-Week Platform Migration​

Why This Exists - The Problem with DIY ML Infrastructure​

The SageMaker Ecosystem​

SageMaker Training Jobs​

Training Script Pattern​

Launching a Training Job via Python SDK​

SageMaker Pipelines​

Why Pipelines Instead of Just Scripts?​

Pipeline Definition​

SageMaker Model Registry​

Approving a Model​

SageMaker Endpoints - Real-Time Inference​

Endpoint Auto-scaling​

SageMaker Feature Store​

S3 as ML Data Lake​

ECR for ML Container Images​

SageMaker Pipelines vs Apache Airflow on AWS​

Production Engineering Notes​

IAM Roles - Minimum Required Permissions​

Endpoint Blue-Green Deployment​

Common Mistakes​

Interview Q&A​

The Eight-Week Migration - What It Actually Looks Like​