VICCA：无人工反馈的生成报告中胸部x线异常的视觉解释和理解

IF 4.9

Machine learning with applications Pub Date : 2025-06-18 DOI:10.1016/j.mlwa.2025.100684

Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier

{"title":"VICCA：无人工反馈的生成报告中胸部x线异常的视觉解释和理解","authors":"Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier","doi":"10.1016/j.mlwa.2025.100684","DOIUrl":null,"url":null,"abstract":"<div><div>As artificial intelligence (AI) becomes increasingly central to healthcare, the demand for explainable and trustworthy models is paramount. Current report generation systems for chest X-rays (CXR) often lack mechanisms for validating outputs without expert oversight, raising concerns about reliability and interpretability. To address these challenges, we propose a novel multimodal framework designed to enhance the semantic alignment between text and image context and the localization accuracy of pathologies within images and reports for AI-generated medical reports. Our framework integrates two key modules: a Phrase Grounding Model, which identifies and localizes pathologies in CXR images based on textual prompts, and a Text-to-Image Diffusion Module, which generates synthetic CXR images from prompts while preserving anatomical fidelity. By comparing features between the original and generated images, we introduce a dual-scoring system: one score quantifies localization accuracy, while the other evaluates semantic consistency between text and image features. Our approach significantly outperforms existing methods in pathology localization, achieving an 8% improvement in Intersection over Union score. It also surpasses state-of-the-art methods in CXR text-to-image generation, with a 1% gain in similarity metrics. Additionally, the integration of phrase grounding with diffusion models, coupled with the dual-scoring evaluation system, provides a robust mechanism for validating report quality, paving the way for more reliable and transparent AI in medical imaging.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100684"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback\",\"authors\":\"Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier\",\"doi\":\"10.1016/j.mlwa.2025.100684\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As artificial intelligence (AI) becomes increasingly central to healthcare, the demand for explainable and trustworthy models is paramount. Current report generation systems for chest X-rays (CXR) often lack mechanisms for validating outputs without expert oversight, raising concerns about reliability and interpretability. To address these challenges, we propose a novel multimodal framework designed to enhance the semantic alignment between text and image context and the localization accuracy of pathologies within images and reports for AI-generated medical reports. Our framework integrates two key modules: a Phrase Grounding Model, which identifies and localizes pathologies in CXR images based on textual prompts, and a Text-to-Image Diffusion Module, which generates synthetic CXR images from prompts while preserving anatomical fidelity. By comparing features between the original and generated images, we introduce a dual-scoring system: one score quantifies localization accuracy, while the other evaluates semantic consistency between text and image features. Our approach significantly outperforms existing methods in pathology localization, achieving an 8% improvement in Intersection over Union score. It also surpasses state-of-the-art methods in CXR text-to-image generation, with a 1% gain in similarity metrics. Additionally, the integration of phrase grounding with diffusion models, coupled with the dual-scoring evaluation system, provides a robust mechanism for validating report quality, paving the way for more reliable and transparent AI in medical imaging.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"21 \",\"pages\":\"Article 100684\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025000672\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着人工智能（AI）在医疗保健领域变得越来越重要，对可解释和可信赖的模型的需求至关重要。目前的胸部x光（CXR）报告生成系统往往缺乏在没有专家监督的情况下验证输出的机制，这引起了对可靠性和可解释性的担忧。为了应对这些挑战，我们提出了一种新的多模态框架，旨在增强文本和图像上下文之间的语义一致性，以及人工智能生成的医学报告中图像和报告中病理的定位准确性。我们的框架集成了两个关键模块：一个短语基础模型，它基于文本提示识别和定位CXR图像中的病理；一个文本到图像扩散模块，它从提示生成合成的CXR图像，同时保持解剖保真度。通过比较原始图像和生成图像之间的特征，我们引入了一个双重评分系统：一个评分量化定位精度，而另一个评分评估文本和图像特征之间的语义一致性。我们的方法在病理定位方面明显优于现有的方法，交叉评分比联合评分提高了8%。在CXR文本到图像生成方面，它也超过了最先进的方法，相似度指标提高了1%。此外，短语基础与扩散模型的集成，加上双重评分评估系统，为验证报告质量提供了强大的机制，为医学成像中更可靠和透明的人工智能铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback

查看原文本刊更多论文

VICCA: Visual interpretation and comprehension of chest X-ray anomalies in generated report without human feedback

As artificial intelligence (AI) becomes increasingly central to healthcare, the demand for explainable and trustworthy models is paramount. Current report generation systems for chest X-rays (CXR) often lack mechanisms for validating outputs without expert oversight, raising concerns about reliability and interpretability. To address these challenges, we propose a novel multimodal framework designed to enhance the semantic alignment between text and image context and the localization accuracy of pathologies within images and reports for AI-generated medical reports. Our framework integrates two key modules: a Phrase Grounding Model, which identifies and localizes pathologies in CXR images based on textual prompts, and a Text-to-Image Diffusion Module, which generates synthetic CXR images from prompts while preserving anatomical fidelity. By comparing features between the original and generated images, we introduce a dual-scoring system: one score quantifies localization accuracy, while the other evaluates semantic consistency between text and image features. Our approach significantly outperforms existing methods in pathology localization, achieving an 8% improvement in Intersection over Union score. It also surpasses state-of-the-art methods in CXR text-to-image generation, with a 1% gain in similarity metrics. Additionally, the integration of phrase grounding with diffusion models, coupled with the dual-scoring evaluation system, provides a robust mechanism for validating report quality, paving the way for more reliable and transparent AI in medical imaging.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days