Hui Liu , Fan Wei , Lixin Yan , Sushan Wang , Chongfu Jia , Lina Zhang , Jiansheng Peng , Yi Xu
{"title":"GVI: Guideable Visual Interpretation on medical tomographic images to improve the performance of deep network","authors":"Hui Liu , Fan Wei , Lixin Yan , Sushan Wang , Chongfu Jia , Lina Zhang , Jiansheng Peng , Yi Xu","doi":"10.1016/j.patrec.2025.05.019","DOIUrl":null,"url":null,"abstract":"<div><div>In medical image analysis, the demand for interpretable deep neural networks is rapidly growing. However, a major challenge is that most existing interpretative methods are applied after training, leading to a lack of integration with the model’s learning process. As a result, these methods often fail to highlight regions within complex medical images critical for decision-making, such as abnormal tissues or lesions, which are essential for accurate diagnoses and treatment planning. This paper introduces Guided Visual Interpretation (GVI), a framework designed to enhance both the performance and interpretability of deep networks. Building on a deep network model with image-level labels, GVI incorporates a small amount of pixel-level annotations combined with attention mechanisms. These mechanisms facilitate visual interpretation through forward propagation, directing the model’s focus to the most relevant regions. By aligning the network’s decision-making with human cognitive processes, GVI improves interpretability. In our study, an attention layer was added after the convolutional layers of a pre-trained classification network. GVI is trained using a mixed supervision approach that integrates pixel-level annotations with a large amount of image-level data. Experimental results on both private and public datasets show that GVI generates visual explanations consistent with human decision-making principles and achieves superior classification accuracy compared to traditional methods. These findings highlight GVI’s potential to improve interpretability and diagnostic performance in critical fields like medical imaging.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 162-168"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002132","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In medical image analysis, the demand for interpretable deep neural networks is rapidly growing. However, a major challenge is that most existing interpretative methods are applied after training, leading to a lack of integration with the model’s learning process. As a result, these methods often fail to highlight regions within complex medical images critical for decision-making, such as abnormal tissues or lesions, which are essential for accurate diagnoses and treatment planning. This paper introduces Guided Visual Interpretation (GVI), a framework designed to enhance both the performance and interpretability of deep networks. Building on a deep network model with image-level labels, GVI incorporates a small amount of pixel-level annotations combined with attention mechanisms. These mechanisms facilitate visual interpretation through forward propagation, directing the model’s focus to the most relevant regions. By aligning the network’s decision-making with human cognitive processes, GVI improves interpretability. In our study, an attention layer was added after the convolutional layers of a pre-trained classification network. GVI is trained using a mixed supervision approach that integrates pixel-level annotations with a large amount of image-level data. Experimental results on both private and public datasets show that GVI generates visual explanations consistent with human decision-making principles and achieves superior classification accuracy compared to traditional methods. These findings highlight GVI’s potential to improve interpretability and diagnostic performance in critical fields like medical imaging.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.