Three-layer architecture connecting source files to predictions to unlearning certificates
The ProvenanceTensor wraps any NumPy or PyTorch array. Every operation appends a node to the LineageGraph DAG. Leaf nodes carry source-file path and row indices. Three granularity modes trade resolution for overhead.
The GradientStore compresses per-sample gradients using sparse JL projection (Achlioptas 2003). At k=4096, the JL distortion bound is epsilon ~= 0.18 - the top-k attribution set is correct with high probability. Two variants: Traceprop-LL (per-sample last-layer, best for tabular) and Traceprop-BM (batch-mean, lower overhead, lower quality).
Provenance-guided unlearning is the key distinction from random unlearning. Because Traceprop knows which source rows correspond to which training tensors, a GDPR erasure request maps automatically to a precise forget set. Gradient correction then reverses training on exactly those samples. The compliance report exports directly to Article 26 JSON schema.
A regulator invokes Article 26. One Traceprop query answers the complete audit question end-to-end - something that previously required manually stitching three separate tool outputs with no consistency guarantees.