Inventory Optimization
The Cost of Getting It Wrong in Both Directions
In the summer of 2021, Peloton had 800 million. The CFO resigned.
That same summer, Ford had to leave 1 semiconductor chip. The cost of the chip shortage to the auto industry was $210 billion in 2021 alone.
Both are inventory failures. One from too much. One from too little. The asymmetry is the core tension of inventory management: holding inventory is expensive (warehouse space, capital, obsolescence risk, spoilage), but being out of stock is also expensive (lost sales, lost customers, expedited shipping to recover). The question is never "should we hold inventory?" The question is always "how much is optimal given uncertain demand?"
For a retailer carrying 100,000 SKUs across 1,000 stores, answering this question manually is impossible. Even if you had perfect demand forecasts (you do not), translating forecasts into optimal reorder points and order quantities for 100 million store-SKU combinations requires automation. This is where ML enters inventory management - not just to forecast demand, but to prescribe optimal inventory policies under uncertainty.
Why This Exists
The traditional approach to inventory management relied on a few heuristics: reorder when stock drops below X weeks of supply, order enough to last Y weeks. The X and Y were set by experienced planners, updated annually, and applied uniformly across categories. This worked tolerably when catalogs were small and demand was stable.
Three changes broke this approach:
Catalog expansion: The shift from physical to omnichannel retail increased the average number of distinct items a retailer carries by 5-10x. A 1990s grocery store carried 8,000 SKUs. Today's equivalent carries 40,000 - with the online catalog extending to 200,000+. Human-set reorder rules cannot scale.
Demand volatility: Social media can turn an obscure item into a viral product overnight. Fast fashion cycles have compressed from seasonal to weekly. The assumption of stable, predictable demand is increasingly wrong.
Supply chain complexity: Multi-echelon supply chains with global suppliers have variable and uncertain lead times. The supplier in Vietnam has a 20-day lead time on average, but it ranges from 12 to 45 days. A fixed reorder point calibrated to average lead time leaves you exposed during long-tail lead time events.
ML-based inventory optimization exists because the problem is quantifiable (we know the exact cost of a stockout and the exact cost of excess inventory), the data is rich (detailed transaction history, lead time history, supplier reliability data), and the decision space is well-structured (order quantity, reorder point, safety stock level). These conditions are ideal for ML-driven prescriptive optimization.
Historical Context
The mathematical foundations of inventory optimization predate computers.
The Economic Order Quantity (EOQ) model was published by Ford Harris in 1913 - over 100 years ago. It gives the optimal order quantity balancing ordering costs (fixed cost per order) against holding costs (cost of keeping inventory). The formula: where D is annual demand, S is ordering cost, and H is holding cost per unit per year. Simple, elegant, and still used.
The Newsvendor Problem (also called the Newsboy Problem) was formalized in the 1950s and 1960s. It is the canonical model for single-period inventory decisions under uncertainty: how many newspapers to order before knowing how many you will sell? Order too few and miss sales. Order too many and discard unsold copies. The optimal solution depends entirely on the ratio of overage to underage costs - a key insight that remains central to modern inventory systems.
(R,Q) and (s,S) inventory policies - reorder point R, order quantity Q; reorder to level S when stock reaches s - were formalized by Hadley and Whitin in 1963. These deterministic frameworks led to the stochastic extensions used today.
The ML revolution in inventory arrived with reinforcement learning. DeepMind's work on applying RL to data center cooling optimization (2016) demonstrated that complex control problems could be solved with RL better than hand-crafted rules. Amazon, Walmart, and JD.com have all published work on RL-based inventory replenishment policies (2018-2022). The consensus: RL outperforms rule-based policies by 5-15% in total cost, particularly for volatile, intermittent demand items.
Core Concepts
The Newsvendor Problem
The Newsvendor Problem is the cleanest possible statement of inventory optimization under uncertainty. Understanding it deeply is essential before building anything more complex.
Setup: You must decide how many units to stock before seeing actual demand. Demand is uncertain - described by a probability distribution. If you stock units:
- If (stockout): you incur underage cost per unit (lost profit, lost customer)
- If (overstock): you incur overage cost per unit (disposal, markdown, carrying cost)
Optimal solution: Stock the quantity satisfying the Critical Ratio (CR):
In English: set your service level equal to the ratio of underage cost to total cost. If a stockout costs you 1 to dispose of, your critical ratio is 9/10 = 0.9. You should stock at the 90th percentile of your demand distribution.
This is profound. The optimal inventory policy depends entirely on the cost asymmetry - not on the shape of demand, not on your intuition. A retailer whose products are perishable (overage is costly) should keep lower inventory than one selling non-perishable staples (overage is cheap).
Safety Stock Calculation
For continuous-review inventory (as opposed to single-period), safety stock absorbs demand and lead time uncertainty. The standard formula:
Where:
- is the z-score for the desired service level (e.g., 1.65 for 95% in-stock rate)
- is the standard deviation of daily demand
- is lead time in days
This assumes demand uncertainty is the only source of variability. In reality, lead time also varies. The combined formula:
Where is mean daily demand and is the standard deviation of lead time.
The service level is not free: higher service level means exponentially more safety stock near the top (95% -> 99% requires 2x more safety stock for the same demand distribution). The right service level is a business decision based on the cost of stockouts vs. carrying costs, not a default.
Reorder Point (ROP)
The reorder point is the inventory level at which you trigger a replenishment order:
Where covers expected demand during lead time, and SS covers uncertainty.
The ROP must be recomputed whenever:
- Demand pattern changes (seasonality, promotion, trend)
- Lead time changes (new supplier, disruption)
- Desired service level changes (business policy update)
In ML-based systems, the ROP is not computed from a formula - it is output by a model trained to minimize total inventory cost given observed demand and lead time distributions.
Multi-Echelon Inventory
Real supply chains have multiple stages. Inventory sits in:
- Supplier warehouse
- National distribution center (DC)
- Regional DC
- Store backroom
- Store shelf
Optimizing each level independently (decentralized optimization) leads to the bullwhip effect: small demand variations at the retail level are amplified at each upstream stage, leading to wild oscillations in supplier order quantities. This happens because each level adds its own safety stock without accounting for the safety stock already held upstream.
Centralized multi-echelon optimization solves for the entire system jointly. The key insight (Clark and Scarf, 1960): decompose the multi-stage problem into a series of single-stage problems by defining "echelon stock" (all inventory in the system downstream from a given stage). Each stage then optimizes against its echelon inventory position rather than its local position.
In practice, ML-based approaches use RL agents at each echelon with shared state that includes upstream inventory positions - approximating the centralized solution without requiring explicit multi-stage stochastic programming.
Reinforcement Learning for Inventory
Inventory replenishment is a natural RL problem:
- State: current inventory level, open orders, demand forecast, days until next ordering opportunity
- Action: order quantity (0 to max order size)
- Reward: negative of total cost (holding cost + stockout cost + ordering cost)
- Transition: inventory evolves by: next_inventory = current_inventory + received_orders - demand
Why RL outperforms formulas:
- RL can learn non-linear relationships between state and optimal action that formulas assume away
- RL naturally handles multi-period look-ahead (today's order affects costs for the next L + SS days)
- RL can incorporate contextual signals (upcoming holiday, promotion, competitor pricing) in a way that updating formula parameters cannot
Practical challenge: RL requires many episodes of experience. You cannot run 100,000 stockout events in your real stores to train an agent. Solution: simulate historical demand using your demand forecasting model to generate synthetic episodes for RL training. Then deploy and fine-tune on actual outcomes.
Practical Implementation
import numpy as np
import pandas as pd
from scipy import stats
from scipy.optimize import minimize_scalar
import matplotlib
matplotlib.use('Agg')
import warnings
warnings.filterwarnings('ignore')
# ============================================================
# 1. Newsvendor Problem Solver
# ============================================================
class NewsvendorSolver:
"""
Solve the newsvendor problem for a given demand distribution
and cost structure.
"""
def __init__(self, underage_cost: float, overage_cost: float):
"""
underage_cost: profit margin lost per unit of stockout (opportunity cost)
overage_cost: cost per excess unit (markdown loss, disposal, carrying)
"""
self.c_u = underage_cost
self.c_o = overage_cost
self.critical_ratio = underage_cost / (underage_cost + overage_cost)
def optimal_quantity(
self,
demand_mean: float,
demand_std: float,
distribution: str = 'normal'
) -> dict:
"""
Compute optimal stocking quantity.
Returns quantity, expected cost, and service level.
"""
if distribution == 'normal':
dist = stats.norm(loc=demand_mean, scale=demand_std)
elif distribution == 'poisson':
dist = stats.poisson(mu=demand_mean)
elif distribution == 'negative_binomial':
# NegBin parameterized by mean and variance
# Useful for overdispersed demand (variance > mean)
r = demand_mean ** 2 / (demand_std ** 2 - demand_mean)
p = r / (r + demand_mean)
dist = stats.nbinom(n=r, p=p)
# Optimal quantity: Q* = F^(-1)(critical_ratio)
q_star = dist.ppf(self.critical_ratio)
# Expected costs at optimal quantity
expected_underage = self.c_u * dist.expect(
lambda x: max(x - q_star, 0)
)
expected_overage = self.c_o * dist.expect(
lambda x: max(q_star - x, 0)
)
total_expected_cost = expected_underage + expected_overage
# Actual service level (fill rate)
service_level = dist.cdf(q_star)
return {
'optimal_quantity': q_star,
'critical_ratio': self.critical_ratio,
'service_level': service_level,
'expected_underage_cost': expected_underage,
'expected_overage_cost': expected_overage,
'total_expected_cost': total_expected_cost,
}
def sensitivity_analysis(
self,
demand_mean: float,
demand_std: float,
quantity_range: np.ndarray
) -> pd.DataFrame:
"""
Compute expected cost across a range of order quantities.
Useful for visualizing the cost curve.
"""
dist = stats.norm(loc=demand_mean, scale=demand_std)
results = []
for q in quantity_range:
exp_underage = self.c_u * dist.expect(lambda x: max(x - q, 0))
exp_overage = self.c_o * dist.expect(lambda x: max(q - x, 0))
results.append({
'quantity': q,
'expected_underage_cost': exp_underage,
'expected_overage_cost': exp_overage,
'total_expected_cost': exp_underage + exp_overage,
'service_level': dist.cdf(q)
})
return pd.DataFrame(results)
# ============================================================
# 2. Safety Stock Calculator
# ============================================================
class SafetyStockCalculator:
"""
Calculate safety stock and reorder points with both
demand uncertainty and lead time uncertainty.
"""
def __init__(self, target_service_level: float = 0.95):
"""
target_service_level: probability of not stocking out
during the replenishment cycle (0.0 to 1.0)
"""
self.service_level = target_service_level
self.z = stats.norm.ppf(target_service_level)
def compute_safety_stock(
self,
demand_mean_daily: float,
demand_std_daily: float,
lead_time_mean_days: float,
lead_time_std_days: float = 0.0,
method: str = 'combined'
) -> dict:
"""
Compute safety stock and reorder point.
Methods:
- 'demand_only': ignore lead time variability
- 'combined': account for both demand and lead time variability
"""
if method == 'demand_only':
# Classic formula: SS = z * sigma_D * sqrt(L)
ss = self.z * demand_std_daily * np.sqrt(lead_time_mean_days)
elif method == 'combined':
# Full formula accounting for both sources of uncertainty
ss = self.z * np.sqrt(
lead_time_mean_days * demand_std_daily ** 2 +
demand_mean_daily ** 2 * lead_time_std_days ** 2
)
# Reorder point: cover expected demand during lead time + safety stock
expected_demand_during_lt = demand_mean_daily * lead_time_mean_days
rop = expected_demand_during_lt + ss
# Cycle service level - probability of no stockout per replenishment cycle
return {
'safety_stock': ss,
'reorder_point': rop,
'expected_demand_during_leadtime': expected_demand_during_lt,
'z_score': self.z,
'service_level': self.service_level,
}
def service_level_curve(
self,
demand_mean_daily: float,
demand_std_daily: float,
lead_time_mean_days: float,
lead_time_std_days: float,
service_levels: np.ndarray = None
) -> pd.DataFrame:
"""
Show how safety stock requirements change with service level.
"""
if service_levels is None:
service_levels = np.arange(0.85, 0.9995, 0.005)
results = []
for sl in service_levels:
z = stats.norm.ppf(sl)
ss = z * np.sqrt(
lead_time_mean_days * demand_std_daily ** 2 +
demand_mean_daily ** 2 * lead_time_std_days ** 2
)
results.append({
'service_level': sl,
'z_score': z,
'safety_stock': ss,
'reorder_point': demand_mean_daily * lead_time_mean_days + ss
})
return pd.DataFrame(results)
# ============================================================
# 3. ML-Based Demand Distribution Estimation
# ============================================================
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import QuantileRegressor
class QuantileDemandForecaster:
"""
Estimate demand distribution using quantile regression.
Provides P10, P50, P90 forecasts for use in newsvendor / safety stock.
"""
def __init__(self, quantiles: list = [0.1, 0.5, 0.9]):
self.quantiles = quantiles
self.models = {}
def fit(self, X: np.ndarray, y: np.ndarray):
"""
Train a separate quantile regressor for each target quantile.
"""
for q in self.quantiles:
model = QuantileRegressor(
quantile=q,
alpha=0.1, # regularization
solver='highs'
)
model.fit(X, y)
self.models[q] = model
return self
def predict_quantiles(self, X: np.ndarray) -> pd.DataFrame:
"""
Return predictions for all quantiles.
"""
predictions = {}
for q, model in self.models.items():
predictions[f'p{int(q*100)}'] = model.predict(X)
return pd.DataFrame(predictions)
def estimate_distribution_params(
self,
X: np.ndarray
) -> tuple:
"""
Fit a normal distribution to the quantile predictions.
Returns (mean, std) for each sample.
"""
quantile_preds = self.predict_quantiles(X)
p10 = quantile_preds['p10'].values
p90 = quantile_preds['p90'].values
p50 = quantile_preds['p50'].values
# Estimate std from P10/P90 range (assumes normal distribution)
# P90 - P10 = 2 * 1.28 * sigma
sigma = (p90 - p10) / (2 * 1.28)
sigma = np.maximum(sigma, 0.1) # floor to avoid zero std
return p50, sigma # mean, std
# ============================================================
# 4. Full Inventory Policy Optimizer
# ============================================================
class InventoryPolicyOptimizer:
"""
End-to-end inventory policy: combines demand forecasting,
safety stock calculation, and newsvendor optimization.
"""
def __init__(
self,
holding_cost_rate: float = 0.25, # annual % of item value
stockout_cost_multiplier: float = 2.0 # multiple of item margin
):
self.holding_cost_rate = holding_cost_rate
self.stockout_cost_multiplier = stockout_cost_multiplier
def optimize_sku(
self,
sku_id: str,
demand_history: pd.Series,
lead_time_history: pd.Series,
unit_cost: float,
unit_margin: float,
review_period_days: int = 7
) -> dict:
"""
Compute optimal inventory policy for a single SKU.
"""
# Estimate demand distribution
demand_mean = demand_history.mean()
demand_std = demand_history.std()
# Estimate lead time distribution
lt_mean = lead_time_history.mean()
lt_std = lead_time_history.std()
# Daily holding cost
daily_holding_cost = unit_cost * self.holding_cost_rate / 365
# Underage cost (per unit stockout): lost margin
c_u = unit_margin * self.stockout_cost_multiplier
# Overage cost (per unit excess): daily holding for review period
c_o = daily_holding_cost * review_period_days
# Newsvendor for order quantity
solver = NewsvendorSolver(
underage_cost=c_u,
overage_cost=c_o
)
nv_result = solver.optimal_quantity(
demand_mean=demand_mean * review_period_days,
demand_std=demand_std * np.sqrt(review_period_days)
)
# Safety stock
ss_calc = SafetyStockCalculator(
target_service_level=nv_result['service_level']
)
ss_result = ss_calc.compute_safety_stock(
demand_mean_daily=demand_mean,
demand_std_daily=demand_std,
lead_time_mean_days=lt_mean,
lead_time_std_days=lt_std
)
return {
'sku_id': sku_id,
'demand_mean_daily': demand_mean,
'demand_std_daily': demand_std,
'lead_time_mean': lt_mean,
'lead_time_std': lt_std,
'critical_ratio': nv_result['critical_ratio'],
'optimal_order_quantity': nv_result['optimal_quantity'],
'safety_stock': ss_result['safety_stock'],
'reorder_point': ss_result['reorder_point'],
'target_service_level': nv_result['service_level'],
'underage_cost_per_unit': c_u,
'overage_cost_per_unit': c_o,
}
# ============================================================
# 5. Simple RL Environment for Inventory Simulation
# ============================================================
class InventoryEnvironment:
"""
Single-item inventory environment for RL training.
Uses demand forecast as a generative model for episodes.
"""
def __init__(
self,
demand_mean: float,
demand_std: float,
lead_time_mean: int,
lead_time_std: int,
holding_cost: float,
stockout_cost: float,
max_inventory: int = 500,
episode_length: int = 365
):
self.demand_mean = demand_mean
self.demand_std = demand_std
self.lt_mean = lead_time_mean
self.lt_std = lead_time_std
self.h = holding_cost # per unit per day
self.s = stockout_cost # per unit stockout
self.max_inv = max_inventory
self.T = episode_length
self.reset()
def reset(self) -> np.ndarray:
"""Reset to initial state."""
self.inventory = int(self.demand_mean * self.lt_mean * 1.5)
self.day = 0
self.pending_orders = {} # {arrival_day: quantity}
self.total_cost = 0.0
return self._get_state()
def step(self, order_quantity: int) -> tuple:
"""
Take action (place an order), simulate one day.
Returns: (next_state, reward, done, info)
"""
# Place order if quantity > 0
if order_quantity > 0:
lead_time = max(1, int(np.random.normal(self.lt_mean, self.lt_std)))
arrival_day = self.day + lead_time
self.pending_orders[arrival_day] = (
self.pending_orders.get(arrival_day, 0) + order_quantity
)
# Receive orders arriving today
received = self.pending_orders.pop(self.day, 0)
self.inventory = min(self.inventory + received, self.max_inv)
# Simulate demand
demand = max(0, int(np.random.normal(self.demand_mean, self.demand_std)))
# Compute costs
stockout = max(0, demand - self.inventory)
fulfilled = min(demand, self.inventory)
holding_cost = self.h * max(0, self.inventory - demand)
stockout_cost = self.s * stockout
daily_cost = holding_cost + stockout_cost
self.total_cost += daily_cost
reward = -daily_cost # RL maximizes reward = minimizes cost
# Update inventory
self.inventory = max(0, self.inventory - demand)
self.day += 1
done = self.day >= self.T
info = {
'demand': demand,
'fulfilled': fulfilled,
'stockout': stockout,
'received': received,
'holding_cost': holding_cost,
'stockout_cost': stockout_cost,
}
return self._get_state(), reward, done, info
def _get_state(self) -> np.ndarray:
"""State: current inventory, outstanding orders, day of year."""
outstanding = sum(self.pending_orders.values())
return np.array([
self.inventory / self.max_inv, # normalized inventory
outstanding / self.max_inv, # normalized outstanding orders
self.day % 365 / 365, # seasonal position
], dtype=np.float32)
Architecture Diagrams
Inventory Policy Decision Flow
Multi-Echelon Inventory System
Production Engineering Notes
Service Level Differentiation
Not all SKUs deserve the same service level target. A pharmacist running out of insulin is categorically different from a grocery store running out of organic quinoa. Retailers segment SKUs by:
Revenue impact: High-velocity, high-margin items (A items) get 99% service levels. Slow-moving, low-margin items (C items) get 90%. The inventory investment per percentage point of service level increases non-linearly near the top.
Substitutability: If a customer will accept Brand B when Brand A is out, the effective service level is higher than the individual item service level suggests. Model substitution explicitly.
Criticality: Some items (batteries, cold medicine, certain food staples) are "destination purchases" - customers choose the store specifically for these items. Stockouts cause store switching. These items get the highest service levels regardless of margin.
Substitution Effects
When Item A stocks out, some customers buy Item B (a substitute). This creates a demand coupling between items that most inventory models ignore. In categories with strong substitution (private label vs national brand, different pack sizes of the same product), ignoring substitution leads to:
- Over-ordering of national brand (because you see substitution demand as real demand)
- Under-ordering of private label (because you see demand decrease when national brand is available)
Model substitution using a Markov chain: transition probabilities between items when the preferred item is out of stock. These probabilities can be estimated from historical data by looking at purchase patterns in periods when specific items were out of stock.
Perishables and Freshness
Perishable inventory (fresh food, dairy, flowers, pharmaceuticals with expiry dates) has an additional cost dimension: age-based disposal. The overage cost increases as inventory ages. This transforms the static newsvendor into a dynamic problem.
Dynamic Newsvendor for Perishables: At each decision point, consider current inventory age distribution, not just quantity. Items near expiry have higher effective overage cost. The optimal policy orders less than the static newsvendor would suggest, because old inventory that you carry forward is not fungible with fresh inventory.
Common Mistakes
:::danger Ignoring Lead Time Variability The most common inventory formula mistake: using only demand uncertainty in safety stock calculation while assuming lead time is deterministic. If your supplier's lead time has a standard deviation of 5 days and mean of 20 days, and you ignore this, you will have systematic stockouts on the 30-40% of orders that arrive late. Lead time variability often dominates demand variability for stable, high-volume items. Always include in your safety stock formula. :::
:::danger Fixed Service Level for All SKUs Setting a uniform 95% service level across the entire catalog wastes capital. For an item with very low demand (10 units/month) and low margin, 95% service level requires more safety stock than for a high-velocity item. Segment by velocity, margin, and substitutability. The investment in high service levels for low-velocity items has very low ROI compared to investing that capital in high-velocity items. :::
:::warning Demand Signal from Sales, Not True Demand Your demand history is actually sales history - you can only sell units you have. When you stock out, the customer's demand is not recorded; you see 0 sales during a stockout period. If you use sales as your demand signal, you systematically underestimate demand for items that frequently stock out, leading to even lower order quantities and more stockouts. This is a reinforcing loop. Use demand sensing to estimate true demand: during known stockout periods, impute demand using trend and seasonality from non-stockout periods. :::
:::warning Bullwhip Effect from Independent Optimization If each echelon in your supply chain independently optimizes its own inventory without sharing state, you will observe the bullwhip effect: small demand fluctuations at retail amplify into large oscillations at upstream levels. Store planners over-order when they see a demand spike, causing a larger spike signal to the regional DC, which causes an even larger spike to the national DC, which causes the supplier to see wildly volatile orders. Share downstream demand signals directly with upstream planners. Do not just share orders placed - share actual end-consumer demand. :::
Interview Questions and Answers
Q1: Explain the Newsvendor Problem and how it applies to a fashion retailer deciding how many units of a seasonal item to order.
A: The Newsvendor Problem models a single-period inventory decision under demand uncertainty. For a fashion retailer, the setup is perfect: you must place a pre-season order before knowing actual demand. The optimal order quantity maximizes expected profit (or minimizes expected cost) and is given by the Critical Ratio: order to the quantile of demand distribution equal to , where is the underage cost (lost margin per stockout unit) and is the overage cost (markdown + disposal cost per excess unit). For fashion items with 50% margins and markdowns that recover 20% of cost on excess, (lost margin), (markdown loss), giving CR = 0.5/(0.5+0.2) = 0.71. Order to the 71st percentile of demand. The implication: a retailer with high margins and low markdown recovery should stock more aggressively (higher CR) than one with low margins and high markdown recovery (lower CR). This framework quantifies the exact intuition most fashion buyers apply qualitatively.
Q2: What is the bullwhip effect and how do you prevent it?
A: The bullwhip effect is the amplification of demand variability as you move upstream in the supply chain. Small end-consumer demand swings translate to large order quantity swings at the supplier level. Causes: (1) demand signal processing - each level adds safety stock based on observed order variability, not end consumer demand; (2) order batching - ordering weekly instead of daily converts continuous demand into lumpy signals; (3) shortage gaming - when suppliers allocate scarce goods by order quantity, buyers inflate orders to get their fair share. Prevention: (1) share POS data directly with all supply chain partners - everyone sees actual end consumer demand, not orders from the next level; (2) move to continuous replenishment (Vendor Managed Inventory - VMI) where the supplier sees your inventory levels directly and replenishes autonomously; (3) reduce order batch size through automation (smaller, more frequent orders are more informative than large batched orders). Walmart's supplier portal, which gives all suppliers direct access to real-time store inventory data, is the canonical implementation of bullwhip prevention at scale.
Q3: How would you set safety stock levels for 100,000 SKUs across 500 stores - you can not manually tune each one?
A: The approach is to build a parametric safety stock policy that takes item and location attributes as inputs and outputs the safety stock. Start by computing three quantities for each SKU-store combination from historical data: demand mean and standard deviation, lead time mean and standard deviation, and the cost ratio (holding cost vs. stockout penalty, which varies by category). Apply the combined safety stock formula: . The service level is not uniform - it is a function of item attributes. Build a service level model: a regression that predicts the optimal service level for an item given its velocity (units per week), gross margin percentage, number of substitutes, and category type. Train this model by optimizing the service level for a sample of items where you have sufficient history to measure the true cost of different service levels. The regression generalizes this to all items. The result is 100,000 data-driven safety stock values, each calibrated to the specific cost structure and uncertainty characteristics of that item. Re-run quarterly as demand patterns and supplier lead times evolve.
Q4: Describe a reinforcement learning approach to inventory replenishment. What are the state, action, and reward?
A: State: the information available to the agent at each decision point. At minimum: current inventory level, quantity of orders in transit (and expected arrival days), current demand forecast (mean and uncertainty) for the next lead time + review period, and seasonal/calendar features. More advanced: competitor stock levels for substitutable items, promotional calendar, supplier reliability score. Action: order quantity to place today (continuous or discretized integer). The action space can be simplified by parameterizing as multiples of a base order quantity. Reward: negative of total daily cost, combining holding cost (units held times per-unit-per-day cost) plus stockout cost (units short times penalty per unit). The agent learns to balance these. Key implementation detail: because a stockout today was caused by an order decision made lead-time-days ago, the credit assignment problem is significant. Use a reward that discounts over the full lead time horizon, not just same-day cost. Training: simulate historical demand to generate many episode trajectories. Off-policy methods (DQN, SAC) train on replay buffers, which is efficient. Deploy: start in shadow mode computing recommendations alongside existing rules, measure cost in simulation before live deployment.
Q5: A product goes viral on TikTok and demand spikes 10x overnight. How does your inventory system respond?
A: This requires a layered response because inventory systems operate on different time horizons. Layer 1 (immediate, hours): The demand signal hits the forecasting system within hours via social media monitoring or a sudden spike in actual sales/search. The forecasting model detects an anomaly (sales 5-sigma above rolling average) and triggers a "viral event" flag. This flag overrides the normal forecast and sets a demand multiplier. Layer 2 (same day): Replenishment logic detects that projected inventory falls below reorder point under the new demand scenario. Emergency replenishment orders are generated with expedited shipping. Existing inventory may be reallocated from lower-demand stores to higher-demand stores. Layer 3 (1-3 days): Communication to suppliers requesting prioritization of this SKU. The challenge is supplier lead time - you cannot manufacture new inventory overnight. What you can do is pull forward already-planned production and reallocate finished goods from non-viral distribution to viral stores. Layer 4 (3-10 days): Model the demand decay curve from past viral events. Most TikTok-driven spikes decay within 2-4 weeks. Calibrate the order quantity to avoid being stuck with massive overstock after the viral demand normalizes. The critical engineering piece: a "viral event" classifier that fires early (within 2-4 hours of the spike starting), triggering the emergency response before stockout occurs rather than after.
