Filter differentiation: An effective approach to interpret convolutional neural networks

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-04-28 DOI:10.1016/j.ins.2025.122253

Yongkai Fan, Hongxue Bao, Xia Lei

{"title":"Filter differentiation: An effective approach to interpret convolutional neural networks","authors":"Yongkai Fan, Hongxue Bao, Xia Lei","doi":"10.1016/j.ins.2025.122253","DOIUrl":null,"url":null,"abstract":"<div><div>The lack of interpretability in deep learning poses a major challenge for AI security, as it hinders the detection and prevention of potential vulnerabilities. Understanding black-box models, such as Convolutional Neural Networks (CNNs), is crucial for establishing trust in them. Currently, filter disentanglement is a mainstream approach for interpreting CNNs, but existing efforts still face the problem of reducing filter entanglement without compromising model accuracy. Inspired by bionic theory, we propose a filter differentiation method that disentangles filters while improving model accuracy by simulating the process of pluripotent to unipotent cell differentiation. Specifically, by using a differentiation matrix based on attention mechanisms and an activation matrix based on mutual information between filters and classes, the convolutional weights of filters can be adjusted, allowing general filters in CNNs to be differentiated into specialized filters that respond only to specific classes. Experiments on benchmark datasets, including CIFAR-10, CIFAR-100, and TinyImageNet, show that our method achieves consistent improvements in model performance. It improves accuracy by 0.5% to 2% across various architectures, including ResNet18 and MobileNetV2, while enhancing filter interpretability as measured by Mutual Information Scores (MIS). These results demonstrate that our method achieves an effective balance between interpretability and accuracy.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"716 ","pages":"Article 122253"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525003858","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The lack of interpretability in deep learning poses a major challenge for AI security, as it hinders the detection and prevention of potential vulnerabilities. Understanding black-box models, such as Convolutional Neural Networks (CNNs), is crucial for establishing trust in them. Currently, filter disentanglement is a mainstream approach for interpreting CNNs, but existing efforts still face the problem of reducing filter entanglement without compromising model accuracy. Inspired by bionic theory, we propose a filter differentiation method that disentangles filters while improving model accuracy by simulating the process of pluripotent to unipotent cell differentiation. Specifically, by using a differentiation matrix based on attention mechanisms and an activation matrix based on mutual information between filters and classes, the convolutional weights of filters can be adjusted, allowing general filters in CNNs to be differentiated into specialized filters that respond only to specific classes. Experiments on benchmark datasets, including CIFAR-10, CIFAR-100, and TinyImageNet, show that our method achieves consistent improvements in model performance. It improves accuracy by 0.5% to 2% across various architectures, including ResNet18 and MobileNetV2, while enhancing filter interpretability as measured by Mutual Information Scores (MIS). These results demonstrate that our method achieves an effective balance between interpretability and accuracy.

查看原文本刊更多论文

滤波微分：一种解释卷积神经网络的有效方法

深度学习中缺乏可解释性对人工智能安全构成了重大挑战，因为它阻碍了潜在漏洞的检测和预防。理解黑箱模型，如卷积神经网络（cnn），对于建立对它们的信任至关重要。目前，滤波器解纠缠是解释cnn的主流方法，但现有的努力仍然面临着在不影响模型精度的情况下减少滤波器纠缠的问题。受仿生学理论的启发，我们提出了一种过滤器分化方法，通过模拟多能细胞向单能细胞分化的过程，在去除过滤器的同时提高模型的准确性。具体而言，通过使用基于注意机制的微分矩阵和基于滤波器和类别之间互信息的激活矩阵，可以调整滤波器的卷积权重，从而将cnn中的一般滤波器区分为仅响应特定类别的专用滤波器。在包括CIFAR-10、CIFAR-100和TinyImageNet在内的基准数据集上的实验表明，我们的方法在模型性能上取得了一致的改进。它在各种架构（包括ResNet18和MobileNetV2）中提高了0.5%到2%的准确性，同时通过互信息分数（MIS）增强了过滤器的可解释性。这些结果表明，我们的方法在可解释性和准确性之间取得了有效的平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.