Wenyi Zhang, Haoran Zhang, Xisheng Zhang, Xiaohua Shen, Lejun Zou
{"title":"IP-CAM: Class activation mapping based on importance weights and principal-component weights for better and simpler visual explanations","authors":"Wenyi Zhang, Haoran Zhang, Xisheng Zhang, Xiaohua Shen, Lejun Zou","doi":"10.1016/j.cviu.2025.104523","DOIUrl":null,"url":null,"abstract":"<div><div>Visual explanations of deep neural networks (DNNs) have gained considerable importance in deep learning due to the lack of interpretability, which constrains human trust in DNNs. This paper proposes a new gradient-free class activation map (CAM) architecture called importance principal-component CAM (IP-CAM). The architecture not only improves the prediction accuracy of networks but also provides simpler and more reliable visual explanations. It adds importance weight layers before the classifier and assigns an importance weight to each activation map. After fine-tuning, it selects images with the highest prediction score for each class, performs principal component analysis (PCA) on activation maps of all channels, and regards the eigenvector of the first principal component as principal-component weights for that class. The final saliency map is obtained by linearly combining the activation maps, importance weights and principal-component weights. IP-CAM is evaluated on the ILSVRC 2012 dataset and RSD46-WHU dataset, whose results show that IP-CAM performs better than most previous CAM variants in recognition and localization tasks. Finally, the method is applied as a tool for interpretability, and the results illustrate that IP-CAM effectively unveils the decision-making process of DNNs through saliency maps.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"261 ","pages":"Article 104523"},"PeriodicalIF":3.5000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225002462","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Visual explanations of deep neural networks (DNNs) have gained considerable importance in deep learning due to the lack of interpretability, which constrains human trust in DNNs. This paper proposes a new gradient-free class activation map (CAM) architecture called importance principal-component CAM (IP-CAM). The architecture not only improves the prediction accuracy of networks but also provides simpler and more reliable visual explanations. It adds importance weight layers before the classifier and assigns an importance weight to each activation map. After fine-tuning, it selects images with the highest prediction score for each class, performs principal component analysis (PCA) on activation maps of all channels, and regards the eigenvector of the first principal component as principal-component weights for that class. The final saliency map is obtained by linearly combining the activation maps, importance weights and principal-component weights. IP-CAM is evaluated on the ILSVRC 2012 dataset and RSD46-WHU dataset, whose results show that IP-CAM performs better than most previous CAM variants in recognition and localization tasks. Finally, the method is applied as a tool for interpretability, and the results illustrate that IP-CAM effectively unveils the decision-making process of DNNs through saliency maps.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems