Notable interpretability

XtrAIn: Training-Guided Occlusion for Feature Attribution

Thodoris Lymperopoulos, Ioannis Kakogeorgiou, Denia Kanellopoulou

Published: Jun 9, 2026 — 13:52 UTC

Problem
This paper addresses the limitations of existing occlusion-based feature attribution methods, which suffer from biases introduced by externally selected baselines, out-of-distribution samples, and unstable explanations. The authors highlight a phenomenon termed “attribution shift,” where the occlusion of certain features can alter the contribution of non-occluded features in nonlinear models. This work is particularly relevant as it is presented as a preprint and has not yet undergone peer review.

Method
The core contribution of this work is the XtrAIn method, which innovatively transfers the occlusion operation from the input space to the parameter space. Instead of using hand-crafted baselines to replace input values, XtrAIn tracks the model’s training trajectory and assesses how updates to feature-associated parameters influence output logits. The authors also propose Xstep, a computationally efficient approximation to reduce the cost of the method, and XtrAIn+, a variant that focuses on parameter updates aligned with the target class. The training process is leveraged to provide a more stable and interpretable attribution framework.

Results
The authors conducted experiments on controlled image datasets and the PAM50 breast-cancer subtype classification task. They report that XtrAIn produces cleaner and more interpretable attribution patterns compared to standard attribution baselines, although specific quantitative results (e.g., effect sizes or accuracy improvements) are not detailed in the abstract. The improvements in interpretability suggest a significant enhancement over traditional methods, which often yield noisy or biased attributions.

Limitations
The authors acknowledge that their approach may still be sensitive to the choice of model architecture and the specific training dynamics, which could affect the generalizability of the results. Additionally, while XtrAIn aims to mitigate biases, it does not eliminate them entirely, and the reliance on the training process may introduce its own set of challenges. The paper does not address the scalability of the method to larger datasets or more complex models, which could be a potential limitation for practical applications.

Why it matters
XtrAIn provides a novel perspective on feature attribution by integrating training dynamics into the attribution process, potentially leading to more reliable and interpretable models. This work has significant implications for the development of diagnostic tools in machine learning, particularly in fields requiring high-stakes decision-making, such as healthcare. The insights gained from XtrAIn could inform future research on improving model interpretability and robustness, as discussed in related works on feature attribution methods, as published in arXiv.

By Callan Zhang · Jun 9, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.LG