What is inventory optimization ML?

Newsvendor problem, safety stock optimization, reorder point prediction, multi-echelon inventory, and ML-driven policies that balance stockouts against carrying costs at retail scale.

How does newsvendor problem retail work in practice?

Inventory Optimization covers inventory optimization ML, newsvendor problem retail, safety stock calculation from first principles with code examples. Free lesson at https://engineersofai.com/docs/applied-ai/ai-in-retail/Inventory-Optimization

What is the difference between inventory optimization ML and safety stock calculation?

See the full breakdown at https://engineersofai.com/docs/applied-ai/ai-in-retail/Inventory-Optimization

Inventory Optimization

The Cost of Getting It Wrong in Both Directions

In the summer of 2021, Peloton had $1.25 billion in excess inventory sitting in warehouses. They had forecasted pandemic-level demand would continue indefinitely, placed enormous purchase orders with suppliers, and watched as demand normalized faster than their models predicted. The write-down was$ 800 million. The CFO resigned.

That same summer, Ford had to leave $1 billion worth of nearly-complete F-150 trucks parked in lots because they were missing a single$ 1 semiconductor chip. The cost of the chip shortage to the auto industry was $210 billion in 2021 alone.

Both are inventory failures. One from too much. One from too little. The asymmetry is the core tension of inventory management: holding inventory is expensive (warehouse space, capital, obsolescence risk, spoilage), but being out of stock is also expensive (lost sales, lost customers, expedited shipping to recover). The question is never "should we hold inventory?" The question is always "how much is optimal given uncertain demand?"

For a retailer carrying 100,000 SKUs across 1,000 stores, answering this question manually is impossible. Even if you had perfect demand forecasts (you do not), translating forecasts into optimal reorder points and order quantities for 100 million store-SKU combinations requires automation. This is where ML enters inventory management - not just to forecast demand, but to prescribe optimal inventory policies under uncertainty.

Why This Exists

The traditional approach to inventory management relied on a few heuristics: reorder when stock drops below X weeks of supply, order enough to last Y weeks. The X and Y were set by experienced planners, updated annually, and applied uniformly across categories. This worked tolerably when catalogs were small and demand was stable.

Three changes broke this approach:

Catalog expansion: The shift from physical to omnichannel retail increased the average number of distinct items a retailer carries by 5-10x. A 1990s grocery store carried 8,000 SKUs. Today's equivalent carries 40,000 - with the online catalog extending to 200,000+. Human-set reorder rules cannot scale.

Demand volatility: Social media can turn an obscure item into a viral product overnight. Fast fashion cycles have compressed from seasonal to weekly. The assumption of stable, predictable demand is increasingly wrong.

Supply chain complexity: Multi-echelon supply chains with global suppliers have variable and uncertain lead times. The supplier in Vietnam has a 20-day lead time on average, but it ranges from 12 to 45 days. A fixed reorder point calibrated to average lead time leaves you exposed during long-tail lead time events.

ML-based inventory optimization exists because the problem is quantifiable (we know the exact cost of a stockout and the exact cost of excess inventory), the data is rich (detailed transaction history, lead time history, supplier reliability data), and the decision space is well-structured (order quantity, reorder point, safety stock level). These conditions are ideal for ML-driven prescriptive optimization.

Historical Context

The mathematical foundations of inventory optimization predate computers.

The Economic Order Quantity (EOQ) model was published by Ford Harris in 1913 - over 100 years ago. It gives the optimal order quantity balancing ordering costs (fixed cost per order) against holding costs (cost of keeping inventory). The formula: $Q^* = \sqrt{2DS/H}$ where D is annual demand, S is ordering cost, and H is holding cost per unit per year. Simple, elegant, and still used.

The Newsvendor Problem (also called the Newsboy Problem) was formalized in the 1950s and 1960s. It is the canonical model for single-period inventory decisions under uncertainty: how many newspapers to order before knowing how many you will sell? Order too few and miss sales. Order too many and discard unsold copies. The optimal solution depends entirely on the ratio of overage to underage costs - a key insight that remains central to modern inventory systems.

(R,Q) and (s,S) inventory policies - reorder point R, order quantity Q; reorder to level S when stock reaches s - were formalized by Hadley and Whitin in 1963. These deterministic frameworks led to the stochastic extensions used today.

The ML revolution in inventory arrived with reinforcement learning. DeepMind's work on applying RL to data center cooling optimization (2016) demonstrated that complex control problems could be solved with RL better than hand-crafted rules. Amazon, Walmart, and JD.com have all published work on RL-based inventory replenishment policies (2018-2022). The consensus: RL outperforms rule-based policies by 5-15% in total cost, particularly for volatile, intermittent demand items.

Core Concepts

The Newsvendor Problem

The Newsvendor Problem is the cleanest possible statement of inventory optimization under uncertainty. Understanding it deeply is essential before building anything more complex.

Setup: You must decide how many units to stock before seeing actual demand. Demand $D$ is uncertain - described by a probability distribution. If you stock $Q$ units:

If $D > Q$ (stockout): you incur underage cost $c_u$ per unit (lost profit, lost customer)
If $D < Q$ (overstock): you incur overage cost $c_o$ per unit (disposal, markdown, carrying cost)

Optimal solution: Stock the quantity $Q^*$ satisfying the Critical Ratio (CR):

$P(D \leq Q^*) = \text{CR} = \frac{c_u}{c_u + c_o}$

In English: set your service level equal to the ratio of underage cost to total cost. If a stockout costs you $9 in lost margin and an extra unit costs$ 1 to dispose of, your critical ratio is 9/10 = 0.9. You should stock at the 90th percentile of your demand distribution.

This is profound. The optimal inventory policy depends entirely on the cost asymmetry - not on the shape of demand, not on your intuition. A retailer whose products are perishable (overage is costly) should keep lower inventory than one selling non-perishable staples (overage is cheap).

Safety Stock Calculation

For continuous-review inventory (as opposed to single-period), safety stock absorbs demand and lead time uncertainty. The standard formula:

$SS = z_\alpha \cdot \sigma_D \cdot \sqrt{L}$

Where:

$z_\alpha$ is the z-score for the desired service level (e.g., 1.65 for 95% in-stock rate)
$\sigma_D$ is the standard deviation of daily demand
$L$ is lead time in days

This assumes demand uncertainty is the only source of variability. In reality, lead time also varies. The combined formula:

$SS = z_\alpha \cdot \sqrt{L \cdot \sigma_D^2 + D^2 \cdot \sigma_L^2}$

Where $D$ is mean daily demand and $\sigma_L$ is the standard deviation of lead time.

The service level is not free: higher service level means exponentially more safety stock near the top (95% -> 99% requires 2x more safety stock for the same demand distribution). The right service level is a business decision based on the cost of stockouts vs. carrying costs, not a default.

Reorder Point (ROP)

The reorder point is the inventory level at which you trigger a replenishment order:

$ROP = D_{avg} \cdot L_{avg} + SS$

Where $D_{avg} \cdot L_{avg}$ covers expected demand during lead time, and SS covers uncertainty.

The ROP must be recomputed whenever:

Demand pattern changes (seasonality, promotion, trend)
Lead time changes (new supplier, disruption)
Desired service level changes (business policy update)

In ML-based systems, the ROP is not computed from a formula - it is output by a model trained to minimize total inventory cost given observed demand and lead time distributions.

Multi-Echelon Inventory

Real supply chains have multiple stages. Inventory sits in:

Supplier warehouse
National distribution center (DC)
Regional DC
Store backroom
Store shelf

Optimizing each level independently (decentralized optimization) leads to the bullwhip effect: small demand variations at the retail level are amplified at each upstream stage, leading to wild oscillations in supplier order quantities. This happens because each level adds its own safety stock without accounting for the safety stock already held upstream.

Centralized multi-echelon optimization solves for the entire system jointly. The key insight (Clark and Scarf, 1960): decompose the multi-stage problem into a series of single-stage problems by defining "echelon stock" (all inventory in the system downstream from a given stage). Each stage then optimizes against its echelon inventory position rather than its local position.

In practice, ML-based approaches use RL agents at each echelon with shared state that includes upstream inventory positions - approximating the centralized solution without requiring explicit multi-stage stochastic programming.

Reinforcement Learning for Inventory

Inventory replenishment is a natural RL problem:

State: current inventory level, open orders, demand forecast, days until next ordering opportunity
Action: order quantity (0 to max order size)
Reward: negative of total cost (holding cost + stockout cost + ordering cost)
Transition: inventory evolves by: next_inventory = current_inventory + received_orders - demand

Why RL outperforms formulas:

RL can learn non-linear relationships between state and optimal action that formulas assume away
RL naturally handles multi-period look-ahead (today's order affects costs for the next L + SS days)
RL can incorporate contextual signals (upcoming holiday, promotion, competitor pricing) in a way that updating formula parameters cannot

Practical challenge: RL requires many episodes of experience. You cannot run 100,000 stockout events in your real stores to train an agent. Solution: simulate historical demand using your demand forecasting model to generate synthetic episodes for RL training. Then deploy and fine-tune on actual outcomes.

Practical Implementation

import numpy as np
import pandas as pd
from scipy import stats
from scipy.optimize import minimize_scalar
import matplotlib
matplotlib.use('Agg')
import warnings
warnings.filterwarnings('ignore')

# ============================================================
# 1. Newsvendor Problem Solver
# ============================================================

class NewsvendorSolver:
    """
    Solve the newsvendor problem for a given demand distribution
    and cost structure.
    """

    def __init__(self, underage_cost: float, overage_cost: float):
        """
        underage_cost: profit margin lost per unit of stockout (opportunity cost)
        overage_cost: cost per excess unit (markdown loss, disposal, carrying)
        """
        self.c_u = underage_cost
        self.c_o = overage_cost
        self.critical_ratio = underage_cost / (underage_cost + overage_cost)

    def optimal_quantity(
        self,
        demand_mean: float,
        demand_std: float,
        distribution: str = 'normal'
    ) -> dict:
        """
        Compute optimal stocking quantity.
        Returns quantity, expected cost, and service level.
        """
        if distribution == 'normal':
            dist = stats.norm(loc=demand_mean, scale=demand_std)
        elif distribution == 'poisson':
            dist = stats.poisson(mu=demand_mean)
        elif distribution == 'negative_binomial':
            # NegBin parameterized by mean and variance
            # Useful for overdispersed demand (variance > mean)
            r = demand_mean ** 2 / (demand_std ** 2 - demand_mean)
            p = r / (r + demand_mean)
            dist = stats.nbinom(n=r, p=p)

        # Optimal quantity: Q* = F^(-1)(critical_ratio)
        q_star = dist.ppf(self.critical_ratio)

        # Expected costs at optimal quantity
        expected_underage = self.c_u * dist.expect(
            lambda x: max(x - q_star, 0)
        )
        expected_overage = self.c_o * dist.expect(
            lambda x: max(q_star - x, 0)
        )
        total_expected_cost = expected_underage + expected_overage

        # Actual service level (fill rate)
        service_level = dist.cdf(q_star)

        return {
            'optimal_quantity': q_star,
            'critical_ratio': self.critical_ratio,
            'service_level': service_level,
            'expected_underage_cost': expected_underage,
            'expected_overage_cost': expected_overage,
            'total_expected_cost': total_expected_cost,
        }

    def sensitivity_analysis(
        self,
        demand_mean: float,
        demand_std: float,
        quantity_range: np.ndarray
    ) -> pd.DataFrame:
        """
        Compute expected cost across a range of order quantities.
        Useful for visualizing the cost curve.
        """
        dist = stats.norm(loc=demand_mean, scale=demand_std)
        results = []
        for q in quantity_range:
            exp_underage = self.c_u * dist.expect(lambda x: max(x - q, 0))
            exp_overage = self.c_o * dist.expect(lambda x: max(q - x, 0))
            results.append({
                'quantity': q,
                'expected_underage_cost': exp_underage,
                'expected_overage_cost': exp_overage,
                'total_expected_cost': exp_underage + exp_overage,
                'service_level': dist.cdf(q)
            })
        return pd.DataFrame(results)


# ============================================================
# 2. Safety Stock Calculator
# ============================================================

class SafetyStockCalculator:
    """
    Calculate safety stock and reorder points with both
    demand uncertainty and lead time uncertainty.
    """

    def __init__(self, target_service_level: float = 0.95):
        """
        target_service_level: probability of not stocking out
                              during the replenishment cycle (0.0 to 1.0)
        """
        self.service_level = target_service_level
        self.z = stats.norm.ppf(target_service_level)

    def compute_safety_stock(
        self,
        demand_mean_daily: float,
        demand_std_daily: float,
        lead_time_mean_days: float,
        lead_time_std_days: float = 0.0,
        method: str = 'combined'
    ) -> dict:
        """
        Compute safety stock and reorder point.

        Methods:
        - 'demand_only': ignore lead time variability
        - 'combined': account for both demand and lead time variability
        """
        if method == 'demand_only':
            # Classic formula: SS = z * sigma_D * sqrt(L)
            ss = self.z * demand_std_daily * np.sqrt(lead_time_mean_days)
        elif method == 'combined':
            # Full formula accounting for both sources of uncertainty
            ss = self.z * np.sqrt(
                lead_time_mean_days * demand_std_daily ** 2 +
                demand_mean_daily ** 2 * lead_time_std_days ** 2
            )

        # Reorder point: cover expected demand during lead time + safety stock
        expected_demand_during_lt = demand_mean_daily * lead_time_mean_days
        rop = expected_demand_during_lt + ss

        # Cycle service level - probability of no stockout per replenishment cycle
        return {
            'safety_stock': ss,
            'reorder_point': rop,
            'expected_demand_during_leadtime': expected_demand_during_lt,
            'z_score': self.z,
            'service_level': self.service_level,
        }

    def service_level_curve(
        self,
        demand_mean_daily: float,
        demand_std_daily: float,
        lead_time_mean_days: float,
        lead_time_std_days: float,
        service_levels: np.ndarray = None
    ) -> pd.DataFrame:
        """
        Show how safety stock requirements change with service level.
        """
        if service_levels is None:
            service_levels = np.arange(0.85, 0.9995, 0.005)

        results = []
        for sl in service_levels:
            z = stats.norm.ppf(sl)
            ss = z * np.sqrt(
                lead_time_mean_days * demand_std_daily ** 2 +
                demand_mean_daily ** 2 * lead_time_std_days ** 2
            )
            results.append({
                'service_level': sl,
                'z_score': z,
                'safety_stock': ss,
                'reorder_point': demand_mean_daily * lead_time_mean_days + ss
            })
        return pd.DataFrame(results)


# ============================================================
# 3. ML-Based Demand Distribution Estimation
# ============================================================

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import QuantileRegressor

class QuantileDemandForecaster:
    """
    Estimate demand distribution using quantile regression.
    Provides P10, P50, P90 forecasts for use in newsvendor / safety stock.
    """

    def __init__(self, quantiles: list = [0.1, 0.5, 0.9]):
        self.quantiles = quantiles
        self.models = {}

    def fit(self, X: np.ndarray, y: np.ndarray):
        """
        Train a separate quantile regressor for each target quantile.
        """
        for q in self.quantiles:
            model = QuantileRegressor(
                quantile=q,
                alpha=0.1,  # regularization
                solver='highs'
            )
            model.fit(X, y)
            self.models[q] = model
        return self

    def predict_quantiles(self, X: np.ndarray) -> pd.DataFrame:
        """
        Return predictions for all quantiles.
        """
        predictions = {}
        for q, model in self.models.items():
            predictions[f'p{int(q*100)}'] = model.predict(X)
        return pd.DataFrame(predictions)

    def estimate_distribution_params(
        self,
        X: np.ndarray
    ) -> tuple:
        """
        Fit a normal distribution to the quantile predictions.
        Returns (mean, std) for each sample.
        """
        quantile_preds = self.predict_quantiles(X)
        p10 = quantile_preds['p10'].values
        p90 = quantile_preds['p90'].values
        p50 = quantile_preds['p50'].values

        # Estimate std from P10/P90 range (assumes normal distribution)
        # P90 - P10 = 2 * 1.28 * sigma
        sigma = (p90 - p10) / (2 * 1.28)
        sigma = np.maximum(sigma, 0.1)  # floor to avoid zero std

        return p50, sigma  # mean, std


# ============================================================
# 4. Full Inventory Policy Optimizer
# ============================================================

class InventoryPolicyOptimizer:
    """
    End-to-end inventory policy: combines demand forecasting,
    safety stock calculation, and newsvendor optimization.
    """

    def __init__(
        self,
        holding_cost_rate: float = 0.25,   # annual % of item value
        stockout_cost_multiplier: float = 2.0  # multiple of item margin
    ):
        self.holding_cost_rate = holding_cost_rate
        self.stockout_cost_multiplier = stockout_cost_multiplier

    def optimize_sku(
        self,
        sku_id: str,
        demand_history: pd.Series,
        lead_time_history: pd.Series,
        unit_cost: float,
        unit_margin: float,
        review_period_days: int = 7
    ) -> dict:
        """
        Compute optimal inventory policy for a single SKU.
        """
        # Estimate demand distribution
        demand_mean = demand_history.mean()
        demand_std = demand_history.std()

        # Estimate lead time distribution
        lt_mean = lead_time_history.mean()
        lt_std = lead_time_history.std()

        # Daily holding cost
        daily_holding_cost = unit_cost * self.holding_cost_rate / 365

        # Underage cost (per unit stockout): lost margin
        c_u = unit_margin * self.stockout_cost_multiplier

        # Overage cost (per unit excess): daily holding for review period
        c_o = daily_holding_cost * review_period_days

        # Newsvendor for order quantity
        solver = NewsvendorSolver(
            underage_cost=c_u,
            overage_cost=c_o
        )
        nv_result = solver.optimal_quantity(
            demand_mean=demand_mean * review_period_days,
            demand_std=demand_std * np.sqrt(review_period_days)
        )

        # Safety stock
        ss_calc = SafetyStockCalculator(
            target_service_level=nv_result['service_level']
        )
        ss_result = ss_calc.compute_safety_stock(
            demand_mean_daily=demand_mean,
            demand_std_daily=demand_std,
            lead_time_mean_days=lt_mean,
            lead_time_std_days=lt_std
        )

        return {
            'sku_id': sku_id,
            'demand_mean_daily': demand_mean,
            'demand_std_daily': demand_std,
            'lead_time_mean': lt_mean,
            'lead_time_std': lt_std,
            'critical_ratio': nv_result['critical_ratio'],
            'optimal_order_quantity': nv_result['optimal_quantity'],
            'safety_stock': ss_result['safety_stock'],
            'reorder_point': ss_result['reorder_point'],
            'target_service_level': nv_result['service_level'],
            'underage_cost_per_unit': c_u,
            'overage_cost_per_unit': c_o,
        }


# ============================================================
# 5. Simple RL Environment for Inventory Simulation
# ============================================================

class InventoryEnvironment:
    """
    Single-item inventory environment for RL training.
    Uses demand forecast as a generative model for episodes.
    """

    def __init__(
        self,
        demand_mean: float,
        demand_std: float,
        lead_time_mean: int,
        lead_time_std: int,
        holding_cost: float,
        stockout_cost: float,
        max_inventory: int = 500,
        episode_length: int = 365
    ):
        self.demand_mean = demand_mean
        self.demand_std = demand_std
        self.lt_mean = lead_time_mean
        self.lt_std = lead_time_std
        self.h = holding_cost      # per unit per day
        self.s = stockout_cost     # per unit stockout
        self.max_inv = max_inventory
        self.T = episode_length

        self.reset()

    def reset(self) -> np.ndarray:
        """Reset to initial state."""
        self.inventory = int(self.demand_mean * self.lt_mean * 1.5)
        self.day = 0
        self.pending_orders = {}  # {arrival_day: quantity}
        self.total_cost = 0.0
        return self._get_state()

    def step(self, order_quantity: int) -> tuple:
        """
        Take action (place an order), simulate one day.
        Returns: (next_state, reward, done, info)
        """
        # Place order if quantity > 0
        if order_quantity > 0:
            lead_time = max(1, int(np.random.normal(self.lt_mean, self.lt_std)))
            arrival_day = self.day + lead_time
            self.pending_orders[arrival_day] = (
                self.pending_orders.get(arrival_day, 0) + order_quantity
            )

        # Receive orders arriving today
        received = self.pending_orders.pop(self.day, 0)
        self.inventory = min(self.inventory + received, self.max_inv)

        # Simulate demand
        demand = max(0, int(np.random.normal(self.demand_mean, self.demand_std)))

        # Compute costs
        stockout = max(0, demand - self.inventory)
        fulfilled = min(demand, self.inventory)

        holding_cost = self.h * max(0, self.inventory - demand)
        stockout_cost = self.s * stockout

        daily_cost = holding_cost + stockout_cost
        self.total_cost += daily_cost
        reward = -daily_cost  # RL maximizes reward = minimizes cost

        # Update inventory
        self.inventory = max(0, self.inventory - demand)
        self.day += 1

        done = self.day >= self.T
        info = {
            'demand': demand,
            'fulfilled': fulfilled,
            'stockout': stockout,
            'received': received,
            'holding_cost': holding_cost,
            'stockout_cost': stockout_cost,
        }

        return self._get_state(), reward, done, info

    def _get_state(self) -> np.ndarray:
        """State: current inventory, outstanding orders, day of year."""
        outstanding = sum(self.pending_orders.values())
        return np.array([
            self.inventory / self.max_inv,         # normalized inventory
            outstanding / self.max_inv,             # normalized outstanding orders
            self.day % 365 / 365,                   # seasonal position
        ], dtype=np.float32)

Architecture Diagrams

Inventory Policy Decision Flow

Multi-Echelon Inventory System

Production Engineering Notes

Service Level Differentiation

Not all SKUs deserve the same service level target. A pharmacist running out of insulin is categorically different from a grocery store running out of organic quinoa. Retailers segment SKUs by:

Revenue impact: High-velocity, high-margin items (A items) get 99% service levels. Slow-moving, low-margin items (C items) get 90%. The inventory investment per percentage point of service level increases non-linearly near the top.

Substitutability: If a customer will accept Brand B when Brand A is out, the effective service level is higher than the individual item service level suggests. Model substitution explicitly.

Criticality: Some items (batteries, cold medicine, certain food staples) are "destination purchases" - customers choose the store specifically for these items. Stockouts cause store switching. These items get the highest service levels regardless of margin.

Substitution Effects

When Item A stocks out, some customers buy Item B (a substitute). This creates a demand coupling between items that most inventory models ignore. In categories with strong substitution (private label vs national brand, different pack sizes of the same product), ignoring substitution leads to:

Over-ordering of national brand (because you see substitution demand as real demand)
Under-ordering of private label (because you see demand decrease when national brand is available)

Model substitution using a Markov chain: transition probabilities between items when the preferred item is out of stock. These probabilities can be estimated from historical data by looking at purchase patterns in periods when specific items were out of stock.

Perishables and Freshness

Perishable inventory (fresh food, dairy, flowers, pharmaceuticals with expiry dates) has an additional cost dimension: age-based disposal. The overage cost increases as inventory ages. This transforms the static newsvendor into a dynamic problem.

Dynamic Newsvendor for Perishables: At each decision point, consider current inventory age distribution, not just quantity. Items near expiry have higher effective overage cost. The optimal policy orders less than the static newsvendor would suggest, because old inventory that you carry forward is not fungible with fresh inventory.

Common Mistakes

:::danger Ignoring Lead Time Variability The most common inventory formula mistake: using only demand uncertainty in safety stock calculation while assuming lead time is deterministic. If your supplier's lead time has a standard deviation of 5 days and mean of 20 days, and you ignore this, you will have systematic stockouts on the 30-40% of orders that arrive late. Lead time variability often dominates demand variability for stable, high-volume items. Always include $D^2 \sigma_L^2$ in your safety stock formula. :::

:::danger Fixed Service Level for All SKUs Setting a uniform 95% service level across the entire catalog wastes capital. For an item with very low demand (10 units/month) and low margin, 95% service level requires more safety stock than for a high-velocity item. Segment by velocity, margin, and substitutability. The investment in high service levels for low-velocity items has very low ROI compared to investing that capital in high-velocity items. :::

:::warning Demand Signal from Sales, Not True Demand Your demand history is actually sales history - you can only sell units you have. When you stock out, the customer's demand is not recorded; you see 0 sales during a stockout period. If you use sales as your demand signal, you systematically underestimate demand for items that frequently stock out, leading to even lower order quantities and more stockouts. This is a reinforcing loop. Use demand sensing to estimate true demand: during known stockout periods, impute demand using trend and seasonality from non-stockout periods. :::

:::warning Bullwhip Effect from Independent Optimization If each echelon in your supply chain independently optimizes its own inventory without sharing state, you will observe the bullwhip effect: small demand fluctuations at retail amplify into large oscillations at upstream levels. Store planners over-order when they see a demand spike, causing a larger spike signal to the regional DC, which causes an even larger spike to the national DC, which causes the supplier to see wildly volatile orders. Share downstream demand signals directly with upstream planners. Do not just share orders placed - share actual end-consumer demand. :::

Interview Questions and Answers

Q1: Explain the Newsvendor Problem and how it applies to a fashion retailer deciding how many units of a seasonal item to order.

A: The Newsvendor Problem models a single-period inventory decision under demand uncertainty. For a fashion retailer, the setup is perfect: you must place a pre-season order before knowing actual demand. The optimal order quantity maximizes expected profit (or minimizes expected cost) and is given by the Critical Ratio: order to the quantile of demand distribution equal to $c_u / (c_u + c_o)$ , where $c_u$ is the underage cost (lost margin per stockout unit) and $c_o$ is the overage cost (markdown + disposal cost per excess unit). For fashion items with 50% margins and markdowns that recover 20% of cost on excess, $c_u = 0.5$ (lost margin), $c_o = 0.2$ (markdown loss), giving CR = 0.5/(0.5+0.2) = 0.71. Order to the 71st percentile of demand. The implication: a retailer with high margins and low markdown recovery should stock more aggressively (higher CR) than one with low margins and high markdown recovery (lower CR). This framework quantifies the exact intuition most fashion buyers apply qualitatively.

Q2: What is the bullwhip effect and how do you prevent it?

A: The bullwhip effect is the amplification of demand variability as you move upstream in the supply chain. Small end-consumer demand swings translate to large order quantity swings at the supplier level. Causes: (1) demand signal processing - each level adds safety stock based on observed order variability, not end consumer demand; (2) order batching - ordering weekly instead of daily converts continuous demand into lumpy signals; (3) shortage gaming - when suppliers allocate scarce goods by order quantity, buyers inflate orders to get their fair share. Prevention: (1) share POS data directly with all supply chain partners - everyone sees actual end consumer demand, not orders from the next level; (2) move to continuous replenishment (Vendor Managed Inventory - VMI) where the supplier sees your inventory levels directly and replenishes autonomously; (3) reduce order batch size through automation (smaller, more frequent orders are more informative than large batched orders). Walmart's supplier portal, which gives all suppliers direct access to real-time store inventory data, is the canonical implementation of bullwhip prevention at scale.

Q3: How would you set safety stock levels for 100,000 SKUs across 500 stores - you can not manually tune each one?

A: The approach is to build a parametric safety stock policy that takes item and location attributes as inputs and outputs the safety stock. Start by computing three quantities for each SKU-store combination from historical data: demand mean and standard deviation, lead time mean and standard deviation, and the cost ratio (holding cost vs. stockout penalty, which varies by category). Apply the combined safety stock formula: $SS = z_\alpha \sqrt{L \sigma_D^2 + D^2 \sigma_L^2}$ . The service level $\alpha$ is not uniform - it is a function of item attributes. Build a service level model: a regression that predicts the optimal service level for an item given its velocity (units per week), gross margin percentage, number of substitutes, and category type. Train this model by optimizing the service level for a sample of items where you have sufficient history to measure the true cost of different service levels. The regression generalizes this to all items. The result is 100,000 data-driven safety stock values, each calibrated to the specific cost structure and uncertainty characteristics of that item. Re-run quarterly as demand patterns and supplier lead times evolve.

Q4: Describe a reinforcement learning approach to inventory replenishment. What are the state, action, and reward?

A: State: the information available to the agent at each decision point. At minimum: current inventory level, quantity of orders in transit (and expected arrival days), current demand forecast (mean and uncertainty) for the next lead time + review period, and seasonal/calendar features. More advanced: competitor stock levels for substitutable items, promotional calendar, supplier reliability score. Action: order quantity to place today (continuous or discretized integer). The action space can be simplified by parameterizing as multiples of a base order quantity. Reward: negative of total daily cost, combining holding cost (units held times per-unit-per-day cost) plus stockout cost (units short times penalty per unit). The agent learns to balance these. Key implementation detail: because a stockout today was caused by an order decision made lead-time-days ago, the credit assignment problem is significant. Use a reward that discounts over the full lead time horizon, not just same-day cost. Training: simulate historical demand to generate many episode trajectories. Off-policy methods (DQN, SAC) train on replay buffers, which is efficient. Deploy: start in shadow mode computing recommendations alongside existing rules, measure cost in simulation before live deployment.

Q5: A product goes viral on TikTok and demand spikes 10x overnight. How does your inventory system respond?

A: This requires a layered response because inventory systems operate on different time horizons. Layer 1 (immediate, hours): The demand signal hits the forecasting system within hours via social media monitoring or a sudden spike in actual sales/search. The forecasting model detects an anomaly (sales 5-sigma above rolling average) and triggers a "viral event" flag. This flag overrides the normal forecast and sets a demand multiplier. Layer 2 (same day): Replenishment logic detects that projected inventory falls below reorder point under the new demand scenario. Emergency replenishment orders are generated with expedited shipping. Existing inventory may be reallocated from lower-demand stores to higher-demand stores. Layer 3 (1-3 days): Communication to suppliers requesting prioritization of this SKU. The challenge is supplier lead time - you cannot manufacture new inventory overnight. What you can do is pull forward already-planned production and reallocate finished goods from non-viral distribution to viral stores. Layer 4 (3-10 days): Model the demand decay curve from past viral events. Most TikTok-driven spikes decay within 2-4 weeks. Calibrate the order quantity to avoid being stuck with massive overstock after the viral demand normalizes. The critical engineering piece: a "viral event" classifier that fires early (within 2-4 hours of the spike starting), triggering the emergency response before stockout occurs rather than after.

The Cost of Getting It Wrong in Both Directions​

Why This Exists​

Historical Context​

Core Concepts​

The Newsvendor Problem​

Safety Stock Calculation​

Reorder Point (ROP)​

Multi-Echelon Inventory​

Reinforcement Learning for Inventory​

Practical Implementation​

Architecture Diagrams​

Inventory Policy Decision Flow​

Multi-Echelon Inventory System​

Production Engineering Notes​

Service Level Differentiation​

Substitution Effects​

Perishables and Freshness​

Common Mistakes​

Interview Questions and Answers​