Radiology AI in Production
The 2 AM Chest CT Queue
It is 2:17 AM. The overnight radiologist at a large academic medical center has 47 studies in her queue. Three chest CTs from the emergency department, six head MRIs from the neurology floor, fourteen plain films from the overnight admits, and a growing stack of portable chest X-rays from the ICU. She has been on shift for six hours. She will be on for three more.
Somewhere in that queue is a patient with a pulmonary embolism that will kill him if it goes unread for another four hours. He came in for a different reason. The ED ordered a chest CT with contrast almost as an afterthought. His study is sitting at position 31 in the queue because it was ordered at 1:53 AM - not urgent enough to be flagged, not routine enough to wait until morning. The radiologist will get to it when she gets to it.
This is the exact scenario that drove the first wave of FDA-cleared radiology AI products. Not the promise of replacing radiologists. Not autonomous diagnosis. The immediate, unglamorous, genuinely lifesaving problem of: which study should she read next? The AI does not need to be a better radiologist than her. It needs to be good enough at one narrow task - detecting a probably-PE chest CT - to move that study from position 31 to position 1 before the patient deteriorates.
That narrow framing is what separates radiology AI that actually ships from radiology AI that stays in research papers. The products that got FDA clearance first were not the most ambitious. They were the most precisely scoped. Aidoc's intracranial hemorrhage detection got cleared in 2018 because it solved exactly one problem: find the bleeds, move them to the top, let the radiologist read them first. The workflow stayed the same. The radiologist was still in the loop. The AI just changed the order.
By 2024, over 700 AI/ML-enabled medical devices had received FDA authorization, the majority of them in radiology. The technical infrastructure for deploying these systems - PACS integration, DICOM pipelines, HL7/FHIR interoperability, regulatory change control - has become a distinct engineering discipline. This lesson covers how that infrastructure works, why each piece exists, and what happens when it breaks in production.
Why This Exists - The Reporting Gap
Radiology in the United States processes roughly 900 million imaging studies per year. The number of radiologists grew slowly through the 2010s while imaging volumes grew at 5-7% per year. The math does not work. Studies that should be read within four hours are waiting eight. Emergency findings that should interrupt the queue are getting missed in the noise.
Beyond volume, there is the detection problem. Radiologists miss roughly 30% of malignant pulmonary nodules on chest X-ray (shown across multiple studies) not because they are incompetent but because the chest X-ray is genuinely hard - the nodule is real but subtle, overlaid on ribs and soft tissue, and the radiologist has seen 200 other studies that day. This is not a personal failure. It is a signal-to-noise problem at scale.
AI was not introduced to radiology because it is smarter than radiologists. It was introduced because it is tireless, it does not have shift fatigue, it processes every pixel every time with the same attention, and it can be trained specifically on the patterns humans most consistently miss. The combination of radiologist judgment and AI consistency is measurably better than either alone.
The technical challenge is that hospitals did not buy PACS systems to be AI platforms. They bought them to store and retrieve images. Getting AI into the radiology workflow means integrating with infrastructure that was designed in the 1990s, runs on aging servers, speaks a protocol (DICOM) that predates the modern web, and must maintain 99.9% uptime because if the PACS goes down the hospital cannot read images.
Historical Context - From Filmless to Intelligent
Radiology moved from film to digital through the 1990s and 2000s. The Picture Archiving and Communication System (PACS) emerged as the central infrastructure: a server that stores images, a viewer that displays them, and the DICOM standard that made images from different scanners readable in different viewers.
DICOM (Digital Imaging and Communications in Medicine) was standardized in 1993 by NEMA (National Electrical Manufacturers Association) and ACR (American College of Radiology). Before DICOM, every scanner vendor had proprietary formats. A Siemens CT could not be read by a GE workstation. DICOM solved this by defining both a file format and a network protocol for transmitting medical images. It is still the foundation of medical imaging infrastructure thirty years later.
The first serious CAD (Computer-Aided Detection) systems appeared in the late 1990s. Early CAD for mammography was cleared by FDA in 1998. These systems used hand-engineered features - Hough transforms for detecting masses, morphological operations for finding microcalcifications - and produced outputs that radiologists almost immediately learned to ignore. The false positive rate was high enough that CAD annotations were often more noise than signal.
Deep learning changed the equation. The 2017 Stanford CheXNet paper showed a CNN trained on 112,120 chest X-rays outperforming radiologists on pneumonia detection. The 2019 Google paper on diabetic retinopathy screening showed an AI system matching ophthalmologist performance on a specific detection task. These results were not uniform across all conditions or all deployment contexts - but they demonstrated that deep learning on medical images was fundamentally different from earlier CAD.
The FDA responded by creating a new regulatory framework for AI/ML-based Software as a Medical Device (SaMD) in 2019. The 510(k) clearance pathway - used for devices substantially equivalent to previously cleared devices - became the primary route for radiology AI. As of 2024, the majority of cleared radiology AI products went through 510(k) rather than the more demanding Premarket Approval (PMA) process.
Core Concepts
DICOM - The Foundation
DICOM is simultaneously a file format and a network protocol. Understanding both halves is essential for anyone building radiology AI.
DICOM as a file format: A DICOM file consists of a header of attribute-value pairs (called tags) followed by pixel data. Tags are identified by a four-digit group number and a four-digit element number, written as (GGGG,EEEE). For example, (0010,0010) is PatientName, (0008,0060) is Modality (CT, MR, DX, etc.), (0028,0030) is PixelSpacing.
The header contains everything you need to interpret the image: patient demographics, acquisition parameters, the physical meaning of pixel values, the coordinate system, the relationship of this slice to adjacent slices in a 3D volume. A CT image without its DICOM header is just a grid of numbers. The header tells you what those numbers mean in Hounsfield Units, where the image sits in 3D space, and how thick each slice is.
DICOM as a network protocol: DICOM defines services for sending and receiving images over a network. The key services are:
- C-STORE: push an image from one system to another
- C-FIND: query a PACS for studies matching certain criteria
- C-MOVE: request that images be sent from one location to another
- C-GET: fetch images directly (less common)
These services use a concept called DICOM Association - a negotiated connection between two DICOM Application Entities (AEs) that must agree on which SOP Classes (types of DICOM objects) they will exchange. If a scanner's AE title is "SCANNER01" and the PACS's AE title is "PACS_MAIN", the scanner must be configured with the PACS's address and vice versa. This manual configuration is why DICOM networking is still managed by specialized teams.
Anonymization: DICOM tags contain PHI (Protected Health Information) - patient name, date of birth, medical record number, and hundreds of other identifying fields. Before using DICOM data for training AI models, you must strip or replace all PHI. The standard approach uses the DICOM PS 3.15 Annex E "Basic Application Level Confidentiality Profile" which specifies which tags to remove, replace, or keep. Tools like pydicom's dcmread with custom tag replacement or dedicated tools like DicomEdit and CTP (Clinical Trials Processor) automate this.
Safe Harbor de-identification under HIPAA requires removing 18 categories of identifiers including: patient names, geographic data smaller than state, dates (except year), ages over 89, phone numbers, email addresses, SSNs, device identifiers, URLs, and any other unique identifier. After removing these tags, you must also check pixel data for burned-in annotations (some modalities embed patient name and MRN directly in the pixel data - this is common in ultrasound and fluoroscopy).
HL7/FHIR - The Clinical Workflow Layer
DICOM handles images. HL7 (Health Level 7) handles everything else: orders, results, patient demographics, clinical notes, medication lists, lab values. The two standards coexist in every hospital but have very different architectures.
HL7 v2 (the version still dominant in most hospitals) is a pipe-delimited message format from the late 1980s. An order message looks like:
MSH|^~\&|HIS|HOSPITAL|RIS|RADIOLOGY|20240315143022||ORM^O01|12345|P|2.3
PID|1||MRN12345^^^HOSPITAL^MR||DOE^JOHN^A||19750315|M
ORC|NW|ORD789|||||^^^20240315143000^^STAT
OBR|1|ORD789||71046^XR CHEST 2 VIEWS AP&LAT^CPT|||20240315143000
It is ugly but it works and every hospital system speaks it. HL7 messages tell the radiology information system (RIS) what orders exist, which patients are scheduled, and what clinical context (clinical history, indication, relevant labs) should accompany the imaging request.
FHIR (Fast Healthcare Interoperability Resources) is the modern successor, introduced by HL7 in 2011 and gaining traction since CMS made FHIR APIs mandatory for Medicare/Medicaid payers in 2020. FHIR represents health data as REST resources - a Patient is a JSON object at /Patient/{id}, an ImagingStudy is at /ImagingStudy/{id}. FHIR makes health data dramatically easier to work with from a software perspective but adoption in radiology departments is still uneven.
For radiology AI, HL7/FHIR integration matters because:
- Clinical context improves AI performance - knowing the indication ("rule out PE", "evaluate for pneumonia") lets AI systems apply the right model and prioritize appropriately
- Results need to be returned to clinical systems - an AI finding a pulmonary embolism needs to create an alert that appears in the ordering physician's EMR, not just in the PACS
- Workflow routing - STAT orders should trigger different AI handling than routine orders
Deployment Modes
Radiology AI is deployed in three primary modes, and the choice has enormous implications for clinical workflow, regulatory strategy, and liability.
Worklist Prioritization: The AI does not change the report, does not add annotations to the image, does not communicate directly with clinicians. It only re-orders the queue. Studies likely to contain critical findings move to the top. This is the safest mode from a regulatory and liability standpoint because the AI is not providing a diagnosis - it is providing a priority score. Aidoc, RapidAI, and Viz.ai built their initial products in this mode.
Second Reader: The AI analyzes every study and provides annotations - bounding boxes on the image, probability scores, structured findings. The radiologist sees both the raw images and the AI annotations. The radiologist is still required to produce the final report; the AI is advisory. This mode requires more careful UI design (how do you present AI findings without anchoring bias?) and a more detailed FDA submission.
Decision Support in Reporting: The AI integrates with the radiology reporting workflow and pre-populates structured fields in the report template. "AI detected: 3 pulmonary nodules, largest 8mm right upper lobe. Fleischner criteria: recommend CT follow-up at 3-6 months." The radiologist reviews and either accepts or modifies. This is the highest-value mode from a workflow efficiency standpoint but also the most complex to clear through FDA.
FDA 510(k) Clearance
The FDA regulates AI radiology tools as Software as a Medical Device (SaMD). The classification depends on the intended use and the risk level. Most radiology AI products aim for Class II (moderate risk) and clear through the 510(k) pathway, which requires demonstrating substantial equivalence to a predicate device (a previously cleared device with similar intended use and technological characteristics).
The key elements of a radiology AI 510(k) submission:
Intended Use Statement: This is the most critical document. It defines exactly what patient population the device is intended for, what imaging modality and acquisition parameters it is designed to process, what output it produces, and how that output is meant to be used. Example: "The XYZ AI Triage System is intended to identify findings suspected of intracranial hemorrhage in non-contrast CT images of the head in adult patients, for use by trained radiologists to prioritize worklist ordering. It is not intended to replace radiologist interpretation."
Performance Testing: FDA expects sensitivity/specificity on a test dataset that is demographically representative, statistically powered (typically 200+ positive cases), and collected from multiple sites. The performance must be validated on data the model has never seen. For detection tasks, FDA typically wants ROC curves with confidence intervals, performance broken down by subgroup (age, sex, race/ethnicity), and comparison against a reference standard (expert consensus reads).
Clinical Validation: Beyond technical performance, FDA increasingly asks for clinical evidence. Does using this AI actually change clinical outcomes? Does worklist prioritization reduce time-to-read for critical findings? Does the AI reduce miss rates? This evidence is typically gathered through observational studies or reader studies with and without AI.
Predetermined Change Control Plan (PCCP): Added in 2021, this allows manufacturers to specify in advance what kinds of model updates they can make without requiring a new 510(k) submission. Updates that fall within the PCCP (e.g., retraining on new data without changing the architecture) can be deployed faster. Updates outside the PCCP require a new submission.
Code Examples
DICOM Processing Pipeline
import pydicom
import numpy as np
from pathlib import Path
from typing import Optional, Dict, Any
import hashlib
def load_dicom_volume(dicom_dir: str) -> tuple[np.ndarray, Dict[str, Any]]:
"""
Load a series of DICOM files into a 3D numpy array.
Returns the volume in Hounsfield Units and the metadata.
"""
dicom_dir = Path(dicom_dir)
dicom_files = sorted(dicom_dir.glob("*.dcm"))
slices = []
for f in dicom_files:
ds = pydicom.dcmread(str(f))
# Only include actual image slices, not scout/localizer
if hasattr(ds, "ImagePositionPatient"):
slices.append(ds)
# Sort slices by z-position (ImagePositionPatient[2])
slices.sort(key=lambda s: float(s.ImagePositionPatient[2]))
if not slices:
raise ValueError(f"No valid DICOM slices found in {dicom_dir}")
# Extract metadata from first slice
ref = slices[0]
metadata = {
"PatientID": getattr(ref, "PatientID", "UNKNOWN"),
"Modality": getattr(ref, "Modality", "UNKNOWN"),
"PixelSpacing": [float(x) for x in ref.PixelSpacing] if hasattr(ref, "PixelSpacing") else [1.0, 1.0],
"SliceThickness": float(getattr(ref, "SliceThickness", 1.0)),
"Rows": int(ref.Rows),
"Columns": int(ref.Columns),
"NumSlices": len(slices),
"SeriesInstanceUID": str(getattr(ref, "SeriesInstanceUID", "")),
}
# Build 3D volume in HU
volume = np.zeros((len(slices), int(ref.Rows), int(ref.Columns)), dtype=np.int16)
for i, ds in enumerate(slices):
# Apply rescale slope and intercept to convert to Hounsfield Units
pixel_array = ds.pixel_array.astype(np.float32)
slope = float(getattr(ds, "RescaleSlope", 1.0))
intercept = float(getattr(ds, "RescaleIntercept", 0.0))
volume[i] = (pixel_array * slope + intercept).astype(np.int16)
return volume, metadata
def anonymize_dicom(input_path: str, output_path: str, pseudonym: Optional[str] = None) -> None:
"""
Anonymize a DICOM file by removing or replacing PHI tags.
Follows DICOM PS 3.15 Annex E Basic Application Level Confidentiality Profile.
"""
ds = pydicom.dcmread(input_path)
# Tags to remove entirely (return None to delete)
TAGS_TO_REMOVE = [
(0x0008, 0x0014), # Instance Creator UID
(0x0008, 0x0081), # Institution Address
(0x0008, 0x0092), # Referring Physician Address
(0x0008, 0x0096), # Referring Physician Identification
(0x0010, 0x0030), # PatientBirthDate
(0x0010, 0x0040), # PatientSex
(0x0010, 0x1000), # Other Patient IDs
(0x0010, 0x1001), # Other Patient Names
(0x0010, 0x1010), # PatientAge
(0x0010, 0x1020), # PatientSize
(0x0010, 0x1030), # PatientWeight
(0x0010, 0x2160), # EthnicGroup
(0x0010, 0x21B0), # AdditionalPatientHistory
(0x0038, 0x0010), # Admission ID
(0x0040, 0xA124), # UID (in SR)
]
for tag in TAGS_TO_REMOVE:
if tag in ds:
del ds[tag]
# Replace identifying fields with pseudonyms
if pseudonym is None:
# Generate deterministic pseudonym from original ID for traceability
original_id = str(getattr(ds, "PatientID", "UNKNOWN"))
pseudonym = "ANON_" + hashlib.sha256(original_id.encode()).hexdigest()[:8].upper()
ds.PatientName = pseudonym
ds.PatientID = pseudonym
# Zero out PatientBirthDate if it exists (keep only year for age calc if needed)
if hasattr(ds, "PatientBirthDate") and ds.PatientBirthDate:
birth_year = ds.PatientBirthDate[:4] if len(ds.PatientBirthDate) >= 4 else "0000"
ds.PatientBirthDate = birth_year + "0101" # Keep year, zero month/day
# Check for burned-in annotations (common in ultrasound, fluoroscopy)
burned_in = getattr(ds, "BurnedInAnnotation", "NO")
if burned_in == "YES":
# Flag this file for manual review - cannot automatically remove burned-in PHI
raise ValueError(f"File {input_path} has burned-in annotations - requires manual de-identification")
ds.save_as(output_path)
def apply_windowing(volume: np.ndarray, window_center: int, window_width: int) -> np.ndarray:
"""
Apply CT windowing to convert HU values to display range.
Common windows: lung (center=-600, width=1500), PE/soft tissue (center=40, width=400)
"""
lower = window_center - window_width // 2
upper = window_center + window_width // 2
windowed = np.clip(volume, lower, upper)
# Normalize to [0, 1]
windowed = (windowed - lower) / (upper - lower)
return windowed.astype(np.float32)
# Example: PE detection preprocessing
def preprocess_ct_for_pe_detection(dicom_dir: str) -> np.ndarray:
"""
Preprocess a chest CT for pulmonary embolism detection.
Uses PE window (center=100 HU, width=700 HU) to optimize vessel visibility.
"""
volume, metadata = load_dicom_volume(dicom_dir)
# PE protocol: soft tissue + vascular window
volume_windowed = apply_windowing(volume, window_center=100, window_width=700)
# Resample to isotropic 1mm spacing if needed
spacing = metadata["PixelSpacing"] + [metadata["SliceThickness"]]
if not all(abs(s - 1.0) < 0.1 for s in spacing):
# In production, use scipy.ndimage.zoom or SimpleITK for resampling
pass # Resampling implementation depends on your stack
return volume_windowed
PACS Integration - DICOM Listener
from pynetdicom import AE, evt
from pynetdicom.sop_class import CTImageStorage, DigitalXRayImageStorageForPresentation
import logging
from pathlib import Path
import threading
import queue
class DICOMListener:
"""
DICOM C-STORE SCP (Service Class Provider) that receives images from PACS
and routes them to the AI inference pipeline.
"""
def __init__(self, port: int = 11112, ae_title: str = "AI_TRIAGE", storage_dir: str = "/tmp/dicom_spool"):
self.port = port
self.ae_title = ae_title
self.storage_dir = Path(storage_dir)
self.storage_dir.mkdir(parents=True, exist_ok=True)
self.study_queue: queue.Queue = queue.Queue()
self.logger = logging.getLogger(__name__)
def handle_store(self, event) -> int:
"""Handle an incoming C-STORE request."""
ds = event.dataset
ds.file_meta = event.file_meta
study_uid = str(getattr(ds, "StudyInstanceUID", "UNKNOWN"))
series_uid = str(getattr(ds, "SeriesInstanceUID", "UNKNOWN"))
sop_uid = str(getattr(ds, "SOPInstanceUID", "UNKNOWN"))
# Organize storage by study/series
study_dir = self.storage_dir / study_uid / series_uid
study_dir.mkdir(parents=True, exist_ok=True)
save_path = study_dir / f"{sop_uid}.dcm"
try:
ds.save_as(str(save_path), write_like_original=False)
self.logger.info(f"Stored: {save_path}")
# Queue study for processing when series appears complete
# In production: check for series completion via C-FIND before queuing
self.study_queue.put({
"study_uid": study_uid,
"series_uid": series_uid,
"modality": str(getattr(ds, "Modality", "UNKNOWN")),
"study_dir": str(study_dir),
})
except Exception as e:
self.logger.error(f"Failed to save DICOM: {e}")
return 0xA700 # Out of resources error
return 0x0000 # Success
def start(self):
"""Start the DICOM SCP listener."""
ae = AE(ae_title=self.ae_title)
# Accept the storage SOP classes we care about
ae.add_supported_context(CTImageStorage)
ae.add_supported_context(DigitalXRayImageStorageForPresentation)
handlers = [(evt.EVT_C_STORE, self.handle_store)]
self.logger.info(f"Starting DICOM listener on port {self.port}")
ae.start_server(("0.0.0.0", self.port), evt_handlers=handlers, block=True)
def query_pacs_for_study(pacs_host: str, pacs_port: int, pacs_ae: str, study_uid: str) -> list:
"""
Query PACS for all series in a study using C-FIND.
Returns list of series metadata.
"""
from pynetdicom import AE
from pynetdicom.sop_class import StudyRootQueryRetrieveInformationModelFind
import pydicom
ae = AE()
ae.add_requested_context(StudyRootQueryRetrieveInformationModelFind)
series_list = []
with ae.associate(pacs_host, pacs_port, ae_title=pacs_ae) as assoc:
if assoc.is_established:
ds = pydicom.Dataset()
ds.QueryRetrieveLevel = "SERIES"
ds.StudyInstanceUID = study_uid
ds.SeriesInstanceUID = ""
ds.Modality = ""
ds.NumberOfSeriesRelatedInstances = ""
responses = assoc.send_c_find(ds, StudyRootQueryRetrieveInformationModelFind)
for (status, identifier) in responses:
if status and status.Status in (0xFF00, 0xFF01) and identifier:
series_list.append({
"series_uid": str(identifier.SeriesInstanceUID),
"modality": str(getattr(identifier, "Modality", "")),
"num_instances": int(getattr(identifier, "NumberOfSeriesRelatedInstances", 0)),
})
return series_list
Distribution Shift Detection
import numpy as np
from scipy import stats
from dataclasses import dataclass
from typing import List, Optional
import json
from datetime import datetime
@dataclass
class ScannerProfile:
"""Statistical profile of images from a specific scanner/protocol."""
scanner_id: str
mean_hu: float
std_hu: float
slice_thickness_mm: float
pixel_spacing_mm: float
n_samples: int
created_at: str
class DistributionShiftMonitor:
"""
Monitor for distribution shift in production radiology AI.
Tracks pixel statistics, scanner metadata, and model confidence distributions.
"""
def __init__(self, reference_profile: ScannerProfile, alert_threshold_p: float = 0.01):
self.reference = reference_profile
self.alert_threshold = alert_threshold_p
self.recent_stats: List[dict] = []
self.window_size = 100 # Compare against last 100 studies
def record_study(self, volume: np.ndarray, metadata: dict, model_confidence: float):
"""Record statistics from a processed study."""
lung_mask = (volume > -1000) & (volume < -300) # Approximate lung voxels for CT
if lung_mask.sum() > 1000:
sample_hu = volume[lung_mask]
else:
sample_hu = volume.flatten()
self.recent_stats.append({
"timestamp": datetime.utcnow().isoformat(),
"mean_hu": float(np.mean(sample_hu)),
"std_hu": float(np.std(sample_hu)),
"slice_thickness": metadata.get("SliceThickness", 0),
"pixel_spacing": metadata.get("PixelSpacing", [1.0])[0],
"model_confidence": model_confidence,
"scanner_id": metadata.get("ManufacturerModelName", "UNKNOWN"),
})
# Keep only recent window
if len(self.recent_stats) > self.window_size:
self.recent_stats = self.recent_stats[-self.window_size:]
def check_for_shift(self) -> dict:
"""
Test whether recent distribution has shifted from reference.
Returns dict with shift indicators and p-values.
"""
if len(self.recent_stats) < 30:
return {"status": "insufficient_data", "n_samples": len(self.recent_stats)}
recent_mean_hu = [s["mean_hu"] for s in self.recent_stats]
recent_confidence = [s["model_confidence"] for s in self.recent_stats]
# Test if mean HU has shifted (one-sample t-test against reference mean)
t_stat, p_val_hu = stats.ttest_1samp(recent_mean_hu, self.reference.mean_hu)
# Test if confidence distribution has shifted (indicates the model is behaving differently)
# Compare recent confidence to expected baseline using KS test would need historical data
# Here we flag if mean confidence drops significantly
mean_confidence = np.mean(recent_confidence)
# Check for new scanner models
recent_scanners = set(s["scanner_id"] for s in self.recent_stats)
new_scanners = recent_scanners - {self.reference.scanner_id}
alerts = []
if p_val_hu < self.alert_threshold:
direction = "higher" if t_stat > 0 else "lower"
alerts.append({
"type": "pixel_distribution_shift",
"message": f"Mean HU is significantly {direction} than reference (p={p_val_hu:.4f})",
"severity": "warning" if p_val_hu > 0.001 else "critical",
})
if mean_confidence < 0.4:
alerts.append({
"type": "low_model_confidence",
"message": f"Mean model confidence dropped to {mean_confidence:.2f} - possible OOD data",
"severity": "warning",
})
if new_scanners:
alerts.append({
"type": "new_scanner_detected",
"message": f"New scanner models detected: {new_scanners}",
"severity": "info",
})
return {
"status": "shift_detected" if alerts else "nominal",
"n_samples": len(self.recent_stats),
"p_value_hu": float(p_val_hu),
"mean_confidence": float(mean_confidence),
"alerts": alerts,
}
Production Engineering Notes
PACS Vendor Complexity: Every PACS vendor (Sectra, Philips IntelliSpace, Fujifilm Synapse, GE Centricity, Agfa IMPAX) implements DICOM slightly differently. What works with one PACS will fail silently with another. Budget 2-4 weeks per PACS integration for testing and quirk-handling. Maintain integration test suites with DICOM datasets from every supported PACS.
Latency Requirements by Use Case: Worklist prioritization for suspected stroke requires results within 5 minutes of study completion - if the AI takes longer, the patient may have been wheeled to CT already and the prioritization window has closed. PE triage is more forgiving (15-20 minute window). Routine mammography screening can tolerate overnight batch processing. Define your SLA before designing your inference architecture.
GPU Sizing for Production: A 3D chest CT at standard protocol is 300-500 slices at 512x512 pixels. Running a 3D CNN on this volume requires 4-8 GB GPU memory for inference alone. If you are running 50 chest CTs per hour (typical for a large academic center), you need a sizing calculation: (50 studies/hour) x (inference time per study) must fit within your latency SLA with headroom for queue spikes at shift change.
DICOM SR for Structured Outputs: When returning AI results back to the PACS, use DICOM Structured Reports (SR) and/or DICOM Presentation States (PR) rather than creating new images with burned-in annotations. SR keeps results structured and queryable. PR preserves annotations separately from the original images so the radiologist always has access to the clean original.
Version Tracking Every Inference: Every AI inference result stored in the PACS must include the model version that produced it. Hospitals retain studies for 7-10 years. You need to be able to reconstruct exactly which model version analyzed a study from 3 years ago if a litigation question arises.
Common Mistakes
:::danger Training and Deployment Windowing Mismatch CT images can be windowed many different ways. If you trained your model on images windowed for soft tissue (center=40, width=400) but the preprocessing in your inference pipeline applies a lung window (center=-600, width=1500), your model will produce garbage outputs on studies that look normal. Always validate that your preprocessing pipeline reproduces exactly the same pixel value distribution as your training data. Log the statistics at inference time and compare to training distribution statistics. :::
:::danger Missing DICOM Tag Causes Silent Failure
Production DICOM data from older scanners or non-standard protocols frequently has missing tags that your code assumes are present. ds.PixelSpacing will throw AttributeError on a study where PixelSpacing is absent. ds.ImagePositionPatient may be None. ds.SliceThickness may be 0 or absent. Always use getattr(ds, 'TagName', default_value) and validate that critical tags have sensible values before processing. A study that silently fails to process is worse than one that throws a visible error.
:::
:::warning Anchoring Bias in AI Annotation Display Research on radiologist-AI interaction shows a consistent anchoring effect: when AI marks a region as suspicious, radiologists are less likely to report a different finding than if they had read without AI. This is not a bug in the AI - it is a fundamental cognitive bias. If your AI has a high false positive rate for one condition (say, 30% FPR for aortic stenosis), the radiologist's attention will be repeatedly drawn to false positives and away from other findings. Design AI display to minimize anchoring: consider showing AI results after the radiologist has completed an initial review pass, or use a summary panel rather than overlaying annotations directly on the image. :::
:::warning Distribution Shift from Protocol Changes A hospital updating their CT scanner firmware can change image noise characteristics enough to degrade model performance by 5-10 percentage points. A change from 5mm to 2.5mm slice thickness will change the appearance of every nodule in your training data. These changes happen routinely and without notice to AI vendors. Build monitoring that detects metadata changes (slice thickness, kernel, kVp, mAs) and flags studies with acquisition parameters outside the training distribution before they are processed by the model. :::
Interview Q&A
Q: What is the difference between C-STORE, C-FIND, and C-MOVE in DICOM, and when would you use each in an AI pipeline?
A: These are three of the five DICOM message service elements (DIMSE). C-STORE is a push operation: the initiating system (SCU) sends an image to a receiving system (SCP) - this is how scanners push images to PACS after acquisition. C-FIND is a query operation: the SCU sends a query template to the SCP and receives back matching records - this is how you find all CT studies from the last 24 hours that haven't been AI-processed. C-MOVE is a directed pull: the SCU asks the SCP to push images to a third system - this is how you retrieve a specific series from PACS to your AI server without the PACS needing to know your server's address at query time. In practice, an AI routing system typically uses C-STORE (passively receiving images as they arrive), C-FIND (querying for missed or backlogged studies), and C-MOVE (retrieving specific studies on demand).
Q: A hospital deploys your PE detection AI and after 3 months you observe that sensitivity has dropped from 92% to 78%. What are the first five things you investigate?
A: First, check if any scanner hardware was upgraded or firmware updated - manufacturers often change reconstruction kernels in firmware updates which changes image appearance. Second, check if CT angiography protocol changed - different contrast timing or dose will change how pulmonary vessels appear. Third, check the demographics of the patient population - did the hospital start receiving referrals from a different catchment area, or did a seasonal pattern affect who presents with PE? Fourth, check the labeling methodology for your monitoring cohort - if performance is being estimated from radiologist reports, check whether the reporting radiologists changed. Fifth, compare pixel statistics from recent studies against your training distribution - mean HU, standard deviation, and signal-to-noise ratio are fast proxies for whether the images have changed.
Q: Explain the purpose of a Predetermined Change Control Plan (PCCP) and how it affects your model development workflow.
A: A PCCP (established in FDA's 2021 guidance) allows an AI/ML SaMD manufacturer to describe in advance what types of modifications they anticipate making to their device and get FDA agreement that those modifications can be made without a new 510(k) submission. Without a PCCP, any change to the AI algorithm technically requires a new submission, which takes 6-12 months. With a PCCP, updates that fall within the pre-agreed scope - for example, retraining on additional data without architectural changes, or expanding the scanner compatibility list to include new scanner models - can be deployed after internal validation without waiting for FDA review. This changes the model development workflow significantly: you design your update protocol, data collection pipeline, and validation methodology to fit within PCCP boundaries upfront, rather than treating every model update as a regulatory event.
Q: How would you design the presentation of AI findings to minimize anchoring bias while still providing useful information to the radiologist?
A: The key insight is that the order and prominence of information presentation shapes radiologist cognition, and there is no neutral design choice. Several evidence-based strategies: (1) Show AI findings after the radiologist completes an initial read pass - this preserves independent judgment while still catching misses; (2) Present AI findings as a "second check" summary in a separate panel rather than overlaying annotations on the primary reading viewport - this reduces the perceptual salience of AI findings during primary interpretation; (3) Use calibrated uncertainty in AI outputs - "high confidence finding" versus "possible finding for review" helps radiologists apply appropriate scrutiny rather than treating all AI outputs equally; (4) Track agreement rates and provide feedback loops so radiologists can calibrate their trust in the AI over time; (5) Include explainability cues (attention maps, counterfactual reasoning) so the radiologist can evaluate the AI's reasoning rather than just its conclusion.
Q: What is the difference between Safe Harbor de-identification and Expert Determination under HIPAA, and which would you use for training a radiology AI model?
A: Safe Harbor (45 CFR 164.514(b)(2)) requires removing 18 specific categories of identifiers including patient names, geographic subdivisions smaller than state, dates other than year, and 15 other categories. If all 18 categories are removed or generalized, the data is considered de-identified without requiring any statistical analysis. Expert Determination (45 CFR 164.514(b)(1)) requires a statistical or scientific expert to apply generally accepted principles to determine that the risk of re-identification is very small, and to document their methodology. For training a radiology AI model, Safe Harbor is almost always preferred because it is procedurally clear, does not require expert statistical analysis per dataset, and creates a bright-line compliance standard. The downside of Safe Harbor is that it can reduce dataset utility - for example, removing exact dates makes temporal analysis impossible. Expert Determination provides more flexibility but requires engaging a qualified statistician and documenting the analysis, which is expensive and slower. At minimum, Safe Harbor is the floor; Expert Determination is used when specific data elements needed for training (like exact age for age-stratified models) cannot be preserved under Safe Harbor.
Q: Describe the DICOM Structured Report (SR) standard and why it matters for returning AI results to a clinical system.
A: DICOM SR is a standard for encoding structured clinical findings in DICOM format, allowing results to be stored in PACS alongside images and retrieved programmatically. Unlike returning results as burned-in pixel annotations or as proprietary formats, SR uses a controlled vocabulary (SNOMED CT, RadLex, LOINC) to express findings in a machine-readable way. For AI, SR allows you to store: the bounding coordinates of a detected lesion, the probability score, the model version, the anatomical location using standard ontology terms, and a link back to the specific DICOM image and slice where the finding appears. This matters for production because: clinical systems can query for specific AI findings across a patient's history, results are stored in the same infrastructure as images so there is no separate results database to maintain, the structured format allows downstream clinical decision support systems to act on AI findings programmatically, and the linkage back to the specific image facilitates audit and review. The complexity is that SR authoring requires understanding DICOM SR templates, which are non-trivial - use a library like highdicom (Python) rather than building SR encoding from scratch.
