On the limits of neural network explainability via descrambling

IF 2.6 2区数学 Q1 MATHEMATICS, APPLIED

Applied and Computational Harmonic Analysis Pub Date : 2025-07-07 DOI:10.1016/j.acha.2025.101793

Shashank Sule , Richard G. Spencer , Wojciech Czaja

{"title":"On the limits of neural network explainability via descrambling","authors":"Shashank Sule , Richard G. Spencer , Wojciech Czaja","doi":"10.1016/j.acha.2025.101793","DOIUrl":null,"url":null,"abstract":"<div><div>We characterize the exact solutions to <em>neural network descrambling</em>–a mathematical model for explaining the fully connected layers of trained neural networks (NNs). By reformulating the problem to the minimization of the Brockett function arising in graph matching and complexity theory we show that the principal components of the hidden layer preactivations can be characterized as the optimal “explainers” or <em>descramblers</em> for the layer weights, leading to <em>descrambled</em> weight matrices. We show that in typical deep learning contexts these descramblers take diverse and interesting forms including (1) matching largest principal components with the lowest frequency modes of the Fourier basis for isotropic hidden data, (2) discovering the semantic development in two-layer linear NNs for signal recovery problems, and (3) explaining CNNs by optimally permuting the neurons. Our numerical experiments indicate that the eigendecompositions of the hidden layer data–now understood as the descramblers–can also reveal the layer's underlying transformation. These results illustrate that the SVD is more directly related to the explainability of NNs than previously thought and offers a promising avenue for discovering interpretable motifs for the hidden action of NNs, especially in contexts of operator learning or physics-informed NNs, where the input/output data has limited human readability.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101793"},"PeriodicalIF":2.6000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied and Computational Harmonic Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1063520325000478","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

We characterize the exact solutions to neural network descrambling–a mathematical model for explaining the fully connected layers of trained neural networks (NNs). By reformulating the problem to the minimization of the Brockett function arising in graph matching and complexity theory we show that the principal components of the hidden layer preactivations can be characterized as the optimal “explainers” or descramblers for the layer weights, leading to descrambled weight matrices. We show that in typical deep learning contexts these descramblers take diverse and interesting forms including (1) matching largest principal components with the lowest frequency modes of the Fourier basis for isotropic hidden data, (2) discovering the semantic development in two-layer linear NNs for signal recovery problems, and (3) explaining CNNs by optimally permuting the neurons. Our numerical experiments indicate that the eigendecompositions of the hidden layer data–now understood as the descramblers–can also reveal the layer's underlying transformation. These results illustrate that the SVD is more directly related to the explainability of NNs than previously thought and offers a promising avenue for discovering interpretable motifs for the hidden action of NNs, especially in contexts of operator learning or physics-informed NNs, where the input/output data has limited human readability.

查看原文本刊更多论文

解扰论神经网络可解释性的局限性

我们描述了神经网络解码器的精确解决方案，这是一种用于解释训练神经网络（nn）的完全连接层的数学模型。通过将问题重新表述为最小化图匹配和复杂性理论中出现的Brockett函数，我们表明隐藏层预激活的主成分可以被表征为层权重的最优“解释者”或解扰者，从而导致解扰权重矩阵。我们表明，在典型的深度学习环境中，这些解码器采取多种有趣的形式，包括(1)将各向同性隐藏数据的最大主成分与傅里叶基的最低频率模式匹配，(2)发现信号恢复问题的双层线性nn中的语义发展，以及(3)通过优化排列神经元来解释cnn。我们的数值实验表明，隐藏层数据的特征分解（现在被理解为解密器）也可以揭示该层的底层变换。这些结果表明，SVD与神经网络的可解释性比以前认为的更直接相关，并为发现神经网络隐藏动作的可解释动机提供了一条有希望的途径，特别是在算子学习或物理信息神经网络的背景下，其中输入/输出数据限制了人类的可读性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied and Computational Harmonic Analysis 物理-物理：数学物理

CiteScore

5.40

自引率

4.00%

发文量

审稿时长

22.9 weeks

期刊介绍： Applied and Computational Harmonic Analysis (ACHA) is an interdisciplinary journal that publishes high-quality papers in all areas of mathematical sciences related to the applied and computational aspects of harmonic analysis, with special emphasis on innovative theoretical development, methods, and algorithms, for information processing, manipulation, understanding, and so forth. The objectives of the journal are to chronicle the important publications in the rapidly growing field of data representation and analysis, to stimulate research in relevant interdisciplinary areas, and to provide a common link among mathematical, physical, and life scientists, as well as engineers.