1.007x
lineage overhead at 1M elements (macOS op mode)
0.622
LDS on tabular/Adult Income (logistic regression, CPU)
266x
faster than TRAK on CIFAR-2 (CPU vs GPU)
>100%
gap closed in approximate unlearning vs. gold standard
Lineage Overhead Ratio (tracked / baseline)
Lower is better. Ratios below 1.05x acceptable for production. Linux sub-unity = cache locality gain.
Attribution Quality: LDS (Spearman rho)
Higher is better. Tabular: Traceprop-LL matches TRAK quality at 266x lower cost.
Attribution Quality Table
CIFAR-2/ResNet-9 (500 subsets). Traceprop-LL is 266x faster on CPU vs TRAK on GPU.
| Method | LDS | Time | HW |
| TRAK (5 ckpts) | 0.0290 | 691 s | GPU T4 |
| Traceprop-LL | 0.0168 | 2.6 s | CPU |
| Traceprop-BM | 0.0033 | 14.2 s | CPU |
| Random | 0.0205 | <0.001 s | - |
| Tabular: TP-LL | 0.622 | 0.22 s | CPU |
Query Latency (macOS, single thread)
Sub-millisecond for all practical pipeline depths.
| Query | Depth | Latency |
sources() | 1 | <0.001 ms |
ops() | 1 | <0.001 ms |
ancestors() | 10 | 0.004 ms |
ancestors() | 100 | 0.041 ms |
ancestors() | 1000 | 0.420 ms |
trace_to_file() | multi-source | 2.36 ms |
Multi-Source Case Study: 3-Table Credit Risk Pipeline
20,000 applicants, 180,000 total source rows across 3 tables. One-time ETL overhead enables sub-millisecond query-time provenance.
| Metric | Value | Notes |
| Source tables | 3 | application, bureau, previous_application |
| Total source rows | 180,000 | across all 3 tables |
| Avg source rows / training sample | 8.5 | after join aggregation |
| ETL baseline time | 0.100 s | raw pandas |
| Traceprop ETL time | 0.293 s | 2.93x overhead - paid once at ingestion |
ancestors() latency | 0.003 ms | query time |
trace_to_file() latency | 2.36 ms | full attribution + source resolution |
| Attribution contribution | 0.424 / 0.426 / 0.434 | per table - comparably distributed |
Tabular Models: Traceprop is the Right Choice
LDS 0.622 at 0.22s on CPU with full source-file traceability. Tabular models dominate regulated industries (credit, insurance, HR). For these use cases, Traceprop-LL provides TRAK-quality attribution without GPU infrastructure or source-file blindness.
Deep Vision: Use TRAK When GPU is Available
On ResNet-9 with BatchNorm, TRAK achieves LDS 0.0290 (GPU) vs Traceprop-LL 0.0168 (CPU). BatchNorm encodes batch statistics in last-layer features, degrading per-sample gradient signal. For image models, Traceprop provides lineage and unlearning; use TRAK for attribution quality.
Provenance-Guided Unlearning Outperforms Random by 6x
Random unlearning closes 14% of the gap. Provenance-guided unlearning closes >100%. The difference is attribution: knowing which samples are highest-influence and targeting them, rather than randomly sampling a forget set. The gradient correction is first-order approximate - no formal DP guarantee.
GradientStore Memory: Practical at Scale
k=4096, 1M samples: 15.3 GB. Fits a standard cloud instance. For larger datasets, numpy.memmap enables disk-backed storage. At k=512, memory drops to 1.91 GB with lower attribution quality. The JL distortion bound (epsilon ~=0.18 at k=4096) is proven, not empirical.
www.engineersofai.com - AI Letters #33 - Traceprop preprint: DOI 10.5281/zenodo.20036000