Traceprop: All Benchmarks

Preprint: Zenodo DOI 10.5281/zenodo.20036000 - May 2026

1.007x
lineage overhead at 1M elements (macOS op mode)
0.622
LDS on tabular/Adult Income (logistic regression, CPU)
266x
faster than TRAK on CIFAR-2 (CPU vs GPU)
>100%
gap closed in approximate unlearning vs. gold standard
Lineage Overhead Ratio (tracked / baseline)
Lower is better. Ratios below 1.05x acceptable for production. Linux sub-unity = cache locality gain.
Attribution Quality: LDS (Spearman rho)
Higher is better. Tabular: Traceprop-LL matches TRAK quality at 266x lower cost.
Unlearning Effectiveness
Forget-set cross-entropy loss (higher = better unlearning). Test accuracy delta shows 0.5pp drop is acceptable.
Attribution Quality Table
CIFAR-2/ResNet-9 (500 subsets). Traceprop-LL is 266x faster on CPU vs TRAK on GPU.
MethodLDSTimeHW
TRAK (5 ckpts)0.0290691 sGPU T4
Traceprop-LL0.01682.6 sCPU
Traceprop-BM0.003314.2 sCPU
Random0.0205<0.001 s-
Tabular: TP-LL0.6220.22 sCPU
Query Latency (macOS, single thread)
Sub-millisecond for all practical pipeline depths.
QueryDepthLatency
sources()1<0.001 ms
ops()1<0.001 ms
ancestors()100.004 ms
ancestors()1000.041 ms
ancestors()10000.420 ms
trace_to_file()multi-source2.36 ms
Multi-Source Case Study: 3-Table Credit Risk Pipeline
20,000 applicants, 180,000 total source rows across 3 tables. One-time ETL overhead enables sub-millisecond query-time provenance.
MetricValueNotes
Source tables3application, bureau, previous_application
Total source rows180,000across all 3 tables
Avg source rows / training sample8.5after join aggregation
ETL baseline time0.100 sraw pandas
Traceprop ETL time0.293 s2.93x overhead - paid once at ingestion
ancestors() latency0.003 msquery time
trace_to_file() latency2.36 msfull attribution + source resolution
Attribution contribution0.424 / 0.426 / 0.434per table - comparably distributed
Tabular Models: Traceprop is the Right Choice
LDS 0.622 at 0.22s on CPU with full source-file traceability. Tabular models dominate regulated industries (credit, insurance, HR). For these use cases, Traceprop-LL provides TRAK-quality attribution without GPU infrastructure or source-file blindness.
Deep Vision: Use TRAK When GPU is Available
On ResNet-9 with BatchNorm, TRAK achieves LDS 0.0290 (GPU) vs Traceprop-LL 0.0168 (CPU). BatchNorm encodes batch statistics in last-layer features, degrading per-sample gradient signal. For image models, Traceprop provides lineage and unlearning; use TRAK for attribution quality.
Provenance-Guided Unlearning Outperforms Random by 6x
Random unlearning closes 14% of the gap. Provenance-guided unlearning closes >100%. The difference is attribution: knowing which samples are highest-influence and targeting them, rather than randomly sampling a forget set. The gradient correction is first-order approximate - no formal DP guarantee.
GradientStore Memory: Practical at Scale
k=4096, 1M samples: 15.3 GB. Fits a standard cloud instance. For larger datasets, numpy.memmap enables disk-backed storage. At k=512, memory drops to 1.91 GB with lower attribution quality. The JL distortion bound (epsilon ~=0.18 at k=4096) is proven, not empirical.
www.engineersofai.com - AI Letters #33 - Traceprop preprint: DOI 10.5281/zenodo.20036000