GVI: Guideable Visual Interpretation on medical tomographic images to improve the performance of deep network

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters Pub Date : 2025-06-11 DOI:10.1016/j.patrec.2025.05.019

Hui Liu , Fan Wei , Lixin Yan , Sushan Wang , Chongfu Jia , Lina Zhang , Jiansheng Peng , Yi Xu

{"title":"GVI: Guideable Visual Interpretation on medical tomographic images to improve the performance of deep network","authors":"Hui Liu , Fan Wei , Lixin Yan , Sushan Wang , Chongfu Jia , Lina Zhang , Jiansheng Peng , Yi Xu","doi":"10.1016/j.patrec.2025.05.019","DOIUrl":null,"url":null,"abstract":"<div><div>In medical image analysis, the demand for interpretable deep neural networks is rapidly growing. However, a major challenge is that most existing interpretative methods are applied after training, leading to a lack of integration with the model’s learning process. As a result, these methods often fail to highlight regions within complex medical images critical for decision-making, such as abnormal tissues or lesions, which are essential for accurate diagnoses and treatment planning. This paper introduces Guided Visual Interpretation (GVI), a framework designed to enhance both the performance and interpretability of deep networks. Building on a deep network model with image-level labels, GVI incorporates a small amount of pixel-level annotations combined with attention mechanisms. These mechanisms facilitate visual interpretation through forward propagation, directing the model’s focus to the most relevant regions. By aligning the network’s decision-making with human cognitive processes, GVI improves interpretability. In our study, an attention layer was added after the convolutional layers of a pre-trained classification network. GVI is trained using a mixed supervision approach that integrates pixel-level annotations with a large amount of image-level data. Experimental results on both private and public datasets show that GVI generates visual explanations consistent with human decision-making principles and achieves superior classification accuracy compared to traditional methods. These findings highlight GVI’s potential to improve interpretability and diagnostic performance in critical fields like medical imaging.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 162-168"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002132","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In medical image analysis, the demand for interpretable deep neural networks is rapidly growing. However, a major challenge is that most existing interpretative methods are applied after training, leading to a lack of integration with the model’s learning process. As a result, these methods often fail to highlight regions within complex medical images critical for decision-making, such as abnormal tissues or lesions, which are essential for accurate diagnoses and treatment planning. This paper introduces Guided Visual Interpretation (GVI), a framework designed to enhance both the performance and interpretability of deep networks. Building on a deep network model with image-level labels, GVI incorporates a small amount of pixel-level annotations combined with attention mechanisms. These mechanisms facilitate visual interpretation through forward propagation, directing the model’s focus to the most relevant regions. By aligning the network’s decision-making with human cognitive processes, GVI improves interpretability. In our study, an attention layer was added after the convolutional layers of a pre-trained classification network. GVI is trained using a mixed supervision approach that integrates pixel-level annotations with a large amount of image-level data. Experimental results on both private and public datasets show that GVI generates visual explanations consistent with human decision-making principles and achieves superior classification accuracy compared to traditional methods. These findings highlight GVI’s potential to improve interpretability and diagnostic performance in critical fields like medical imaging.

查看原文本刊更多论文

GVI：医学断层图像的可引导视觉解译，以提高深度网络的性能

在医学图像分析中，对可解释深度神经网络的需求正在迅速增长。然而，一个主要的挑战是，大多数现有的解释方法是在训练之后应用的，导致缺乏与模型学习过程的集成。因此，这些方法往往无法突出复杂医学图像中对决策至关重要的区域，例如异常组织或病变，而这些区域对于准确诊断和治疗计划至关重要。本文介绍了引导视觉解释（Guided Visual Interpretation， GVI），这是一个旨在提高深度网络性能和可解释性的框架。GVI基于图像级标签的深度网络模型，将少量像素级注释与注意机制相结合。这些机制通过前向传播促进视觉解释，将模型的焦点引导到最相关的区域。通过将网络的决策与人类认知过程结合起来，GVI提高了可解释性。在我们的研究中，在预训练的分类网络的卷积层之后添加了一个注意层。GVI使用混合监督方法进行训练，该方法将像素级注释与大量图像级数据集成在一起。在私有和公共数据集上的实验结果表明，GVI生成的视觉解释符合人类决策原则，并且与传统方法相比具有更高的分类精度。这些发现突出了GVI在医学成像等关键领域提高可解释性和诊断性能的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.