How Traceprop Works

Three-layer architecture connecting source files to predictions to unlearning certificates

Click each layer tab to explore the architecture, API, and benchmark numbers.
Raw Source Files
CSV, Parquet, SQL - any tabular source. Row-level annotations added at load time.
|
Layer 1: ProvenanceTensor (Traceprop-Core)
Wraps arrays. Records LineageGraph DAG of operations. Sub-1% overhead at 1M elements.
|
Layer 2: GradientStore (Traceprop-Attribution)
JL-compressed per-sample gradients (k=4096). 15.3 GB for 1M samples. Memory-mapped.
|
Layer 3: Unlearn + Compliance (Traceprop-Unlearn)
Provenance-guided gradient correction. Exceeds retrain-from-scratch. Exports Article 26 JSON.
|
Audit Answer
"credit_scores.csv row 4821 - influence score 0.921 - preprocessing: normalize, row_filter"
Layer 1
Lineage
Layer 2
Attribution
Layer 3
Unlearning
Full
End-to-End

The ProvenanceTensor wraps any NumPy or PyTorch array. Every operation appends a node to the LineageGraph DAG. Leaf nodes carry source-file path and row indices. Three granularity modes trade resolution for overhead.

1.007x
overhead at 1M elements (macOS)
0.979x
overhead at 1M elements (Linux)
<0.001ms
sources() and ops() query latency
0.42ms
ancestors() at depth 1000
import traceprop as tp # Load with row annotation X = tp.load_csv("credit_scores.csv") # Preprocessing - auto-tracked X_norm = (X - X.mean(axis=0)) / X.std(axis=0) X_filt = X_norm[X_norm[:, 3] > 0] # Query lineage instantly X_filt.sources() # {credit_scores.csv: [rows 0-4998]} X_filt.ops() # [normalize, row_filter] X_filt.ancestors() # full DAG, depth-unlimited
The sub-unity overhead on Linux (0.979x) is not an error. Traceprop's batch-aware memory layout improves cache locality for large arrays - the lineage tracking is actually faster than raw NumPy for 1M+ elements.

The GradientStore compresses per-sample gradients using sparse JL projection (Achlioptas 2003). At k=4096, the JL distortion bound is epsilon ~= 0.18 - the top-k attribution set is correct with high probability. Two variants: Traceprop-LL (per-sample last-layer, best for tabular) and Traceprop-BM (batch-mean, lower overhead, lower quality).

0.622
LDS on Adult Income tabular (logistic regression)
0.22s
attribution time on CPU (tabular)
266x
faster than TRAK on CIFAR-2 (CPU vs GPU)
15.3 GB
GradientStore at k=4096, 1M samples
from traceprop.attribution import ( TrainingContext, GradientStore, compute_influence_scores ) store = GradientStore(k=4096, path="./grad_store/") with TrainingContext(model, store) as ctx: for epoch in range(num_epochs): for batch_idx, (X_b, y_b) in \ enumerate(loader): loss = criterion(model(X_b), y_b) ctx.backward(loss, batch_idx=batch_idx) optimizer.step() # Trace prediction to source row scores = compute_influence_scores( model, store, test_input, top_k=20) prov = store.get_provenance(scores[0][0]) prov.trace_to_file() # -> credit_scores.csv, row 4821

Provenance-guided unlearning is the key distinction from random unlearning. Because Traceprop knows which source rows correspond to which training tensors, a GDPR erasure request maps automatically to a precise forget set. Gradient correction then reverses training on exactly those samples. The compliance report exports directly to Article 26 JSON schema.

0.425
Traceprop forget-set loss (higher = better unlearning)
0.401
Gold standard (retrain from scratch)
>100%
Gap closed (Traceprop exceeds gold)
14%
Gap closed (random unlearning)
from traceprop.unlearn import ( approximate_unlearn, export_compliance ) # GDPR erasure request maps # automatically to tensor indices forget_set = store.samples_from_source( "credit_scores.csv", rows=[4821, 7203, 9100] ) # Gradient correction theta_prime = approximate_unlearn( model, forget_set, eta=0.01, steps=10 ) # Article 26 compliance export report = export_compliance( model_before=model, model_after=theta_prime, forget_set=forget_set, store=store, regulation="EU_AI_ACT_ART26", ) report.save("certificate.json")
The forget set is identified via attribution (highest-influence samples), not randomly sampled. This is what closes >100% of the gap vs. retrain-from-scratch, while random unlearning closes only 14%.

A regulator invokes Article 26. One Traceprop query answers the complete audit question end-to-end - something that previously required manually stitching three separate tool outputs with no consistency guarantees.

import traceprop as tp from traceprop.attribution import ( TrainingContext, GradientStore, compute_influence_scores ) from traceprop.unlearn import ( approximate_unlearn, export_compliance ) # ----- INGESTION ----- X = tp.load_csv("credit_scores.csv") # row-annotated X_norm = (X - X.mean(axis=0)) / X.std(axis=0) X_filt = X_norm[X_norm[:, 3] > 0] # ----- TRAINING ----- store = GradientStore(k=4096, path="./grad/") with TrainingContext(model, store) as ctx: for epoch in range(epochs): for i, (Xb, yb) in enumerate(loader): ctx.backward(criterion(model(Xb), yb), i) optimizer.step() # ----- AUDIT QUERY ----- # "Which rows drove this declined application?" scores = compute_influence_scores( model, store, declined_application, top_k=5) for idx, score in scores: prov = store.get_provenance(idx) print(prov.trace_to_file(), score) # -> credit_scores.csv row 4821, score=0.921 # ----- GDPR ERASURE ----- forget_set = store.samples_from_source( "credit_scores.csv", rows=[4821]) theta_prime = approximate_unlearn(model, forget_set) export_compliance(...).save("article26_cert.json")
www.engineersofai.com - AI Letters #33