Radiology report generation via visual-semantic ambivalence-aware network and focal self-critical sequence training

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-09-11 DOI:10.1016/j.neunet.2025.108102

Xiulong Yi , You Fu , Enxu Bi , Jianguo Liang , Hao Zhang , Jianzhi Yu , Qianqian Li , Rong Hua , Rui Wang

{"title":"Radiology report generation via visual-semantic ambivalence-aware network and focal self-critical sequence training","authors":"Xiulong Yi , You Fu , Enxu Bi , Jianguo Liang , Hao Zhang , Jianzhi Yu , Qianqian Li , Rong Hua , Rui Wang","doi":"10.1016/j.neunet.2025.108102","DOIUrl":null,"url":null,"abstract":"<div><div>Radiology report generation, which aims to provide accurate descriptions of both normal and abnormal regions, has been attracting growing research attention. Recently, despite considerable progress, data-driven deep-learning based models still face challenges in capturing and describing the abnormalities, due to the data bias problem. To address this problem, we propose to generate radiology reports via the Visual-Semantic Ambivalence-Aware Network (VSANet) and the Focal Self-Critical Sequence Training (FSCST). In detail, our VSANet follows the encoder-decoder framework. In the encoder part, we first deploy a multi-grained abnormality extractor and a visual extractor to capture both semantic and visual features from given images, and then introduce a Parameter Shared Dual-way Encoder (PSDwE) to delve into the inter- and intra-relationships among these features. In the decoder part, we propose the Visual-Semantic Ambivalence-Aware (VSA) module to generate the abnormality-aware visual features to mitigate the data bias problem. In implementation, our VSA introduces three sub-modules: Dual-way Attention (DwA), introduced to generate both the word-related visual and semantic features; Dual-way Attention on Attention (DwAoA), designed to mitigate redundant information; Score-based Feature Fusion (SFF), constructed to fuse the visual and semantic features in an ambivalence way. We further introduce the FSCST to enhance the overall performance of our VSANet by allocating more attention toward difficult samples. Experimental results demonstrate that our proposal achieves superior performance on various evaluation metrics. Source code have released at <span><span>https://github.com/SKD-HPC/VSANet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108102"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025009827","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Radiology report generation, which aims to provide accurate descriptions of both normal and abnormal regions, has been attracting growing research attention. Recently, despite considerable progress, data-driven deep-learning based models still face challenges in capturing and describing the abnormalities, due to the data bias problem. To address this problem, we propose to generate radiology reports via the Visual-Semantic Ambivalence-Aware Network (VSANet) and the Focal Self-Critical Sequence Training (FSCST). In detail, our VSANet follows the encoder-decoder framework. In the encoder part, we first deploy a multi-grained abnormality extractor and a visual extractor to capture both semantic and visual features from given images, and then introduce a Parameter Shared Dual-way Encoder (PSDwE) to delve into the inter- and intra-relationships among these features. In the decoder part, we propose the Visual-Semantic Ambivalence-Aware (VSA) module to generate the abnormality-aware visual features to mitigate the data bias problem. In implementation, our VSA introduces three sub-modules: Dual-way Attention (DwA), introduced to generate both the word-related visual and semantic features; Dual-way Attention on Attention (DwAoA), designed to mitigate redundant information; Score-based Feature Fusion (SFF), constructed to fuse the visual and semantic features in an ambivalence way. We further introduce the FSCST to enhance the overall performance of our VSANet by allocating more attention toward difficult samples. Experimental results demonstrate that our proposal achieves superior performance on various evaluation metrics. Source code have released at https://github.com/SKD-HPC/VSANet.

查看原文本刊更多论文

基于视觉语义矛盾感知网络和焦点自我批判序列训练的放射学报告生成。

放射学报告生成，其目的是提供正常和异常区域的准确描述，已引起越来越多的研究关注。最近，尽管取得了相当大的进展，但由于数据偏差问题，基于数据驱动的深度学习模型在捕获和描述异常方面仍然面临挑战。为了解决这个问题，我们建议通过视觉语义矛盾感知网络（VSANet）和焦点自临界序列训练（FSCST）生成放射学报告。详细地说，我们的VSANet遵循编码器-解码器框架。在编码器部分，我们首先部署多粒度异常提取器和视觉提取器来捕获给定图像的语义和视觉特征，然后引入参数共享双向编码器（PSDwE）来深入研究这些特征之间的相互关系和内部关系。在解码器部分，我们提出了视觉语义矛盾感知（VSA）模块来生成异常感知的视觉特征，以减轻数据偏差问题。在实现中，我们的VSA引入了三个子模块：双向注意（Dual-way Attention, DwA），用于生成与单词相关的视觉和语义特征；双向注意对注意（Dual-way Attention on Attention, DwAoA），旨在减少冗余信息；基于分数的特征融合（SFF）以一种矛盾的方式融合视觉和语义特征。我们进一步引入FSCST，通过对困难样本分配更多的注意力来提高VSANet的整体性能。实验结果表明，我们的方案在各种评估指标上都取得了优异的性能。源代码已在https://github.com/SKD-HPC/VSANet上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.