Enhancing Interpretability of NesT Model Using NesT-Shapley and Feature-Weight-Augmentation Method

IF 1.3 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision Pub Date : 2025-09-06 DOI:10.1049/cvi2.70039

Li Xu, Lei Li, Xiaohong Cong, Huijie Song

{"title":"Enhancing Interpretability of NesT Model Using NesT-Shapley and Feature-Weight-Augmentation Method","authors":"Li Xu, Lei Li, Xiaohong Cong, Huijie Song","doi":"10.1049/cvi2.70039","DOIUrl":null,"url":null,"abstract":"<p>The transformer's capabilities in natural language processing and computer vision are impressive, but interpretability is crucial in specific domain applications. The NesT model, with its pyramidal structure, demonstrates high accuracy and faster training speeds. Unlike other models, a unique aspect of NesT is its avoidance of the [CLS] token, which presents challenges when applying interpretability methods that rely on the model's internal structure. Instead, NesT divides the image into 16 blocks and processes them using 16 independent vision transformers. We propose the NesT-Shapley method, which utilises this structure to combine the Shapley value method (a self-interpretable approach) with the independently operating vision transformers within NesT, significantly reducing computational complexity. On the other hand, we introduced the feature weight augmentation (FWA) method to address the challenges of weight adjustment in the final interpretability results produced by interpretability methods without [CLS] token, markedly enhancing the performance of interpretability methods and providing a better understanding of the information flow during the prediction process in the NesT model. We conducted perturbation experiments on the NesT model using the ImageNet and CIFAR-100 datasets and segmentation experiments on the ImageNet-Segmentation dataset, achieving impressive experimental results.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70039","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.70039","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The transformer's capabilities in natural language processing and computer vision are impressive, but interpretability is crucial in specific domain applications. The NesT model, with its pyramidal structure, demonstrates high accuracy and faster training speeds. Unlike other models, a unique aspect of NesT is its avoidance of the [CLS] token, which presents challenges when applying interpretability methods that rely on the model's internal structure. Instead, NesT divides the image into 16 blocks and processes them using 16 independent vision transformers. We propose the NesT-Shapley method, which utilises this structure to combine the Shapley value method (a self-interpretable approach) with the independently operating vision transformers within NesT, significantly reducing computational complexity. On the other hand, we introduced the feature weight augmentation (FWA) method to address the challenges of weight adjustment in the final interpretability results produced by interpretability methods without [CLS] token, markedly enhancing the performance of interpretability methods and providing a better understanding of the information flow during the prediction process in the NesT model. We conducted perturbation experiments on the NesT model using the ImageNet and CIFAR-100 datasets and segmentation experiments on the ImageNet-Segmentation dataset, achieving impressive experimental results.

Abstract Image

查看原文本刊更多论文

利用NesT- shapley和特征权重增强方法增强NesT模型的可解释性

转换器在自然语言处理和计算机视觉方面的能力令人印象深刻，但是可解释性在特定领域应用程序中是至关重要的。NesT模型具有金字塔结构，具有较高的精度和更快的训练速度。与其他模型不同，NesT的独特之处在于它避免了[CLS]令牌，这在应用依赖于模型内部结构的可解释性方法时提出了挑战。相反，NesT将图像分成16个块，并使用16个独立的视觉转换器对它们进行处理。我们提出了NesT-Shapley方法，该方法利用这种结构将Shapley值方法（一种自解释方法）与NesT内独立运行的视觉变压器相结合，显著降低了计算复杂度。另一方面，我们引入了特征权值增强（FWA）方法，以解决无CLS标记的可解释性方法产生的最终可解释性结果中权值调整的挑战，显著提高了可解释性方法的性能，并更好地理解了NesT模型预测过程中的信息流。我们使用ImageNet和CIFAR-100数据集对NesT模型进行了扰动实验，并在ImageNet- segmentation数据集上进行了分割实验，取得了令人印象深刻的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf