Toward Faithful Neural Network Intrinsic Interpretation With Shapley Additive Self-Attribution

Ying Sun, Hengshu Zhu, Hui Xiong

Research output: Contribution to journalArticlepeer-review

Abstract

Self-interpreting neural networks have attracted significant attention from the research community. Along this line, extensive works inherently share the intuitive principle of linear contribution aggregation from diversified perspectives, while often: 1) lacking a solid theoretical foundation ensuring genuine interpretability and 2) compromising model expressiveness. In response, we propose a generic additive self-attribution (ASA) framework to encapsulate the characteristics of various works in this field and underscore the absence of the Shapley value attribution. To fill in this gap, we propose a novel Shapley additive self-attributing neural network (SASANet). SASANet models meaningful outputs for arbitrary-numbered observable features, naturally leading to an unapproximated value function for Shapely value. Designing an intermediate sequential schema based on marginal contributions (MCs) and internal distillation procedure, we theoretically prove that the intermediate self-attribution value converging to the output’s Shapley values. Finally, we conduct extensive experiments on multiple public datasets. The experimental results clearly demonstrate SASANet, being highly interpretable, outperforms existing self-attributing models in performance and is comparable with commonly adopted closed-box models. In addition, compared with adopting post hoc interpretation methods, SASANet’s self-attribution provides a more accurate and efficient interpretation for its own predictions. To the best of the authors’ knowledge, this is the first self-interpreting neural network structure that achieves modelwise Shapley attribution.

Original languageEnglish (US)
Pages (from-to)16294-16308
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume36
Issue number9
DOIs
StatePublished - 2025

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Toward Faithful Neural Network Intrinsic Interpretation With Shapley Additive Self-Attribution'. Together they form a unique fingerprint.

Cite this