VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback

IF 4.9
Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier
{"title":"VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback","authors":"Sayeh Gholipour Picha,&nbsp;Dawood Al Chanti,&nbsp;Alice Caplier","doi":"10.1016/j.mlwa.2025.100684","DOIUrl":null,"url":null,"abstract":"<div><div>As artificial intelligence (AI) becomes increasingly central to healthcare, the demand for explainable and trustworthy models is paramount. Current report generation systems for chest X-rays (CXR) often lack mechanisms for validating outputs without expert oversight, raising concerns about reliability and interpretability. To address these challenges, we propose a novel multimodal framework designed to enhance the semantic alignment between text and image context and the localization accuracy of pathologies within images and reports for AI-generated medical reports. Our framework integrates two key modules: a Phrase Grounding Model, which identifies and localizes pathologies in CXR images based on textual prompts, and a Text-to-Image Diffusion Module, which generates synthetic CXR images from prompts while preserving anatomical fidelity. By comparing features between the original and generated images, we introduce a dual-scoring system: one score quantifies localization accuracy, while the other evaluates semantic consistency between text and image features. Our approach significantly outperforms existing methods in pathology localization, achieving an 8% improvement in Intersection over Union score. It also surpasses state-of-the-art methods in CXR text-to-image generation, with a 1% gain in similarity metrics. Additionally, the integration of phrase grounding with diffusion models, coupled with the dual-scoring evaluation system, provides a robust mechanism for validating report quality, paving the way for more reliable and transparent AI in medical imaging.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100684"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As artificial intelligence (AI) becomes increasingly central to healthcare, the demand for explainable and trustworthy models is paramount. Current report generation systems for chest X-rays (CXR) often lack mechanisms for validating outputs without expert oversight, raising concerns about reliability and interpretability. To address these challenges, we propose a novel multimodal framework designed to enhance the semantic alignment between text and image context and the localization accuracy of pathologies within images and reports for AI-generated medical reports. Our framework integrates two key modules: a Phrase Grounding Model, which identifies and localizes pathologies in CXR images based on textual prompts, and a Text-to-Image Diffusion Module, which generates synthetic CXR images from prompts while preserving anatomical fidelity. By comparing features between the original and generated images, we introduce a dual-scoring system: one score quantifies localization accuracy, while the other evaluates semantic consistency between text and image features. Our approach significantly outperforms existing methods in pathology localization, achieving an 8% improvement in Intersection over Union score. It also surpasses state-of-the-art methods in CXR text-to-image generation, with a 1% gain in similarity metrics. Additionally, the integration of phrase grounding with diffusion models, coupled with the dual-scoring evaluation system, provides a robust mechanism for validating report quality, paving the way for more reliable and transparent AI in medical imaging.

Abstract Image

VICCA:无人工反馈的生成报告中胸部x线异常的视觉解释和理解
随着人工智能(AI)在医疗保健领域变得越来越重要,对可解释和可信赖的模型的需求至关重要。目前的胸部x光(CXR)报告生成系统往往缺乏在没有专家监督的情况下验证输出的机制,这引起了对可靠性和可解释性的担忧。为了应对这些挑战,我们提出了一种新的多模态框架,旨在增强文本和图像上下文之间的语义一致性,以及人工智能生成的医学报告中图像和报告中病理的定位准确性。我们的框架集成了两个关键模块:一个短语基础模型,它基于文本提示识别和定位CXR图像中的病理;一个文本到图像扩散模块,它从提示生成合成的CXR图像,同时保持解剖保真度。通过比较原始图像和生成图像之间的特征,我们引入了一个双重评分系统:一个评分量化定位精度,而另一个评分评估文本和图像特征之间的语义一致性。我们的方法在病理定位方面明显优于现有的方法,交叉评分比联合评分提高了8%。在CXR文本到图像生成方面,它也超过了最先进的方法,相似度指标提高了1%。此外,短语基础与扩散模型的集成,加上双重评分评估系统,为验证报告质量提供了强大的机制,为医学成像中更可靠和透明的人工智能铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine learning with applications
Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications
自引率
0.00%
发文量
0
审稿时长
98 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信