多模态基础模型与放射科医师对具有挑战性的神经放射学病例的文本和图像诊断的比较。

IF 4.9 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Bastien Le Guellec, Cyril Bruge, Najib Chalhoub, Victor Chaton, Edouard De Sousa, Yann Gaillandre, Riyad Hanafi, Matthieu Masy, Quentin Vannod-Michel, Aghiles Hamroun, Grégory Kuchcinski
{"title":"多模态基础模型与放射科医师对具有挑战性的神经放射学病例的文本和图像诊断的比较。","authors":"Bastien Le Guellec, Cyril Bruge, Najib Chalhoub, Victor Chaton, Edouard De Sousa, Yann Gaillandre, Riyad Hanafi, Matthieu Masy, Quentin Vannod-Michel, Aghiles Hamroun, Grégory Kuchcinski","doi":"10.1016/j.diii.2025.04.006","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The purpose of this study was to compare the ability of two multimodal models (GPT-4o and Gemini 1.5 Pro) with that of radiologists to generate differential diagnoses from textual context alone, key images alone, or a combination of both using complex neuroradiology cases.</p><p><strong>Materials and methods: </strong>This retrospective study included neuroradiology cases from the \"Diagnosis Please\" series published in the Radiology journal between January 2008 and September 2024. The two multimodal models were asked to provide three differential diagnoses from textual context alone, key images alone, or the complete case. Six board-certified neuroradiologists solved the cases in the same setting, randomly assigned to two groups: context alone first and images alone first. Three radiologists solved the cases without, and then with the assistance of Gemini 1.5 Pro. An independent radiologist evaluated the quality of the image descriptions provided by GPT-4o and Gemini for each case. Differences in correct answers between multimodal models and radiologists were analyzed using McNemar test.</p><p><strong>Results: </strong>GPT-4o and Gemini 1.5 Pro outperformed radiologists using clinical context alone (mean accuracy, 34.0 % [18/53] and 44.7 % [23.7/53] vs. 16.4 % [8.7/53]; both P < 0.01). Radiologists outperformed GPT-4o and Gemini 1.5 Pro using images alone (mean accuracy, 42.0 % [22.3/53] vs. 3.8 % [2/53], and 7.5 % [4/53]; both P < 0.01) and the complete cases (48.0 % [25.6/53] vs. 34.0 % [18/53], and 38.7 % [20.3/53]; both P < 0.001). While radiologists improved their accuracy when combining multimodal information (from 42.1 % [22.3/53] for images alone to 50.3 % [26.7/53] for complete cases; P < 0.01), GPT-4o and Gemini 1.5 Pro did not benefit from the multimodal context (from 34.0 % [18/53] for text alone to 35.2 % [18.7/53] for complete cases for GPT-4o; P = 0.48, and from 44.7 % [23.7/53] to 42.8 % [22.7/53] for Gemini 1.5 Pro; P = 0.54). Radiologists benefited significantly from the suggestion of Gemini 1.5 Pro, increasing their accuracy from 47.2 % [25/53] to 56.0 % [27/53] (P < 0.01). Both GPT-4o and Gemini 1.5 Pro correctly identified the imaging modality in 53/53 (100 %) and 51/53 (96.2 %) cases, respectively, but frequently failed to identify key imaging findings (43/53 cases [81.1 %] with incorrect identification of key imaging findings for GPT-4o and 50/53 [94.3 %] for Gemini 1.5).</p><p><strong>Conclusion: </strong>Radiologists show a specific ability to benefit from the integration of textual and visual information, whereas multimodal models mostly rely on the clinical context to suggest diagnoses.</p>","PeriodicalId":48656,"journal":{"name":"Diagnostic and Interventional Imaging","volume":" ","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison between multimodal foundation models and radiologists for the diagnosis of challenging neuroradiology cases with text and images.\",\"authors\":\"Bastien Le Guellec, Cyril Bruge, Najib Chalhoub, Victor Chaton, Edouard De Sousa, Yann Gaillandre, Riyad Hanafi, Matthieu Masy, Quentin Vannod-Michel, Aghiles Hamroun, Grégory Kuchcinski\",\"doi\":\"10.1016/j.diii.2025.04.006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>The purpose of this study was to compare the ability of two multimodal models (GPT-4o and Gemini 1.5 Pro) with that of radiologists to generate differential diagnoses from textual context alone, key images alone, or a combination of both using complex neuroradiology cases.</p><p><strong>Materials and methods: </strong>This retrospective study included neuroradiology cases from the \\\"Diagnosis Please\\\" series published in the Radiology journal between January 2008 and September 2024. The two multimodal models were asked to provide three differential diagnoses from textual context alone, key images alone, or the complete case. Six board-certified neuroradiologists solved the cases in the same setting, randomly assigned to two groups: context alone first and images alone first. Three radiologists solved the cases without, and then with the assistance of Gemini 1.5 Pro. An independent radiologist evaluated the quality of the image descriptions provided by GPT-4o and Gemini for each case. Differences in correct answers between multimodal models and radiologists were analyzed using McNemar test.</p><p><strong>Results: </strong>GPT-4o and Gemini 1.5 Pro outperformed radiologists using clinical context alone (mean accuracy, 34.0 % [18/53] and 44.7 % [23.7/53] vs. 16.4 % [8.7/53]; both P < 0.01). Radiologists outperformed GPT-4o and Gemini 1.5 Pro using images alone (mean accuracy, 42.0 % [22.3/53] vs. 3.8 % [2/53], and 7.5 % [4/53]; both P < 0.01) and the complete cases (48.0 % [25.6/53] vs. 34.0 % [18/53], and 38.7 % [20.3/53]; both P < 0.001). While radiologists improved their accuracy when combining multimodal information (from 42.1 % [22.3/53] for images alone to 50.3 % [26.7/53] for complete cases; P < 0.01), GPT-4o and Gemini 1.5 Pro did not benefit from the multimodal context (from 34.0 % [18/53] for text alone to 35.2 % [18.7/53] for complete cases for GPT-4o; P = 0.48, and from 44.7 % [23.7/53] to 42.8 % [22.7/53] for Gemini 1.5 Pro; P = 0.54). Radiologists benefited significantly from the suggestion of Gemini 1.5 Pro, increasing their accuracy from 47.2 % [25/53] to 56.0 % [27/53] (P < 0.01). Both GPT-4o and Gemini 1.5 Pro correctly identified the imaging modality in 53/53 (100 %) and 51/53 (96.2 %) cases, respectively, but frequently failed to identify key imaging findings (43/53 cases [81.1 %] with incorrect identification of key imaging findings for GPT-4o and 50/53 [94.3 %] for Gemini 1.5).</p><p><strong>Conclusion: </strong>Radiologists show a specific ability to benefit from the integration of textual and visual information, whereas multimodal models mostly rely on the clinical context to suggest diagnoses.</p>\",\"PeriodicalId\":48656,\"journal\":{\"name\":\"Diagnostic and Interventional Imaging\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostic and Interventional Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.diii.2025.04.006\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and Interventional Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.diii.2025.04.006","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究的目的是比较两种多模式模型(gpt - 40和Gemini 1.5 Pro)与放射科医生在使用复杂的神经放射学病例时,单独从文本背景、单独的关键图像或两者结合产生鉴别诊断的能力。材料和方法:本回顾性研究包括2008年1月至2024年9月在放射学杂志上发表的“请诊断”系列神经放射学病例。这两种多模态模型被要求仅从文本上下文、关键图像或完整病例中提供三种鉴别诊断。六名获得委员会认证的神经放射学家在相同的环境下解决了这些病例,他们被随机分为两组:首先单独处理环境,首先单独处理图像。三位放射科医生在没有使用Gemini 1.5 Pro的情况下解决了这些病例。一位独立的放射科医生评估了gpt - 40和Gemini为每个病例提供的图像描述的质量。采用McNemar检验分析多模态模型与放射科医师正确答案的差异。结果:gpt - 40和Gemini 1.5 Pro优于单独使用临床背景的放射科医生(平均准确率分别为34.0%[18/53]和44.7% [23.7/53]vs. 16.4% [8.7/53];P < 0.01)。放射科医生单独使用图像的表现优于gpt - 40和Gemini 1.5 Pro(平均准确率为42.0%[22.3/53],3.8%[2/53]和7.5% [4/53];P < 0.01)和完全病例(48.0% [25.6/53]vs. 34.0%[18/53]和38.7% [20.3/53]);P均< 0.001)。而放射科医生在结合多模态信息时提高了准确率(从单独图像的42.1%[22.3/53]提高到完整病例的50.3% [26.7/53];P < 0.01), gpt - 40和Gemini 1.5 Pro没有从多模态环境中获益(从单纯文本的34.0%[18/53]到完整病例的35.2% [18.7/53];P = 0.48, Gemini 1.5 Pro从44.7%[23.7/53]降至42.8% [22.7/53];P = 0.54)。使用Gemini 1.5 Pro后,放射科医生的准确率从47.2%[25/53]提高到56.0% [27/53](P < 0.01)。gpt - 40和Gemini 1.5 Pro分别在53/53(100%)和51/53(96.2%)的病例中正确识别成像方式,但经常不能识别关键影像学表现(43/53例[81.1%],gpt - 40和Gemini 1.5的50/53[94.3%]不能正确识别关键影像学表现)。结论:放射科医生表现出从文本和视觉信息的整合中获益的特殊能力,而多模态模型主要依赖于临床背景来建议诊断。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison between multimodal foundation models and radiologists for the diagnosis of challenging neuroradiology cases with text and images.

Purpose: The purpose of this study was to compare the ability of two multimodal models (GPT-4o and Gemini 1.5 Pro) with that of radiologists to generate differential diagnoses from textual context alone, key images alone, or a combination of both using complex neuroradiology cases.

Materials and methods: This retrospective study included neuroradiology cases from the "Diagnosis Please" series published in the Radiology journal between January 2008 and September 2024. The two multimodal models were asked to provide three differential diagnoses from textual context alone, key images alone, or the complete case. Six board-certified neuroradiologists solved the cases in the same setting, randomly assigned to two groups: context alone first and images alone first. Three radiologists solved the cases without, and then with the assistance of Gemini 1.5 Pro. An independent radiologist evaluated the quality of the image descriptions provided by GPT-4o and Gemini for each case. Differences in correct answers between multimodal models and radiologists were analyzed using McNemar test.

Results: GPT-4o and Gemini 1.5 Pro outperformed radiologists using clinical context alone (mean accuracy, 34.0 % [18/53] and 44.7 % [23.7/53] vs. 16.4 % [8.7/53]; both P < 0.01). Radiologists outperformed GPT-4o and Gemini 1.5 Pro using images alone (mean accuracy, 42.0 % [22.3/53] vs. 3.8 % [2/53], and 7.5 % [4/53]; both P < 0.01) and the complete cases (48.0 % [25.6/53] vs. 34.0 % [18/53], and 38.7 % [20.3/53]; both P < 0.001). While radiologists improved their accuracy when combining multimodal information (from 42.1 % [22.3/53] for images alone to 50.3 % [26.7/53] for complete cases; P < 0.01), GPT-4o and Gemini 1.5 Pro did not benefit from the multimodal context (from 34.0 % [18/53] for text alone to 35.2 % [18.7/53] for complete cases for GPT-4o; P = 0.48, and from 44.7 % [23.7/53] to 42.8 % [22.7/53] for Gemini 1.5 Pro; P = 0.54). Radiologists benefited significantly from the suggestion of Gemini 1.5 Pro, increasing their accuracy from 47.2 % [25/53] to 56.0 % [27/53] (P < 0.01). Both GPT-4o and Gemini 1.5 Pro correctly identified the imaging modality in 53/53 (100 %) and 51/53 (96.2 %) cases, respectively, but frequently failed to identify key imaging findings (43/53 cases [81.1 %] with incorrect identification of key imaging findings for GPT-4o and 50/53 [94.3 %] for Gemini 1.5).

Conclusion: Radiologists show a specific ability to benefit from the integration of textual and visual information, whereas multimodal models mostly rely on the clinical context to suggest diagnoses.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Diagnostic and Interventional Imaging
Diagnostic and Interventional Imaging Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
8.50
自引率
29.10%
发文量
126
审稿时长
11 days
期刊介绍: Diagnostic and Interventional Imaging accepts publications originating from any part of the world based only on their scientific merit. The Journal focuses on illustrated articles with great iconographic topics and aims at aiding sharpening clinical decision-making skills as well as following high research topics. All articles are published in English. Diagnostic and Interventional Imaging publishes editorials, technical notes, letters, original and review articles on abdominal, breast, cancer, cardiac, emergency, forensic medicine, head and neck, musculoskeletal, gastrointestinal, genitourinary, interventional, obstetric, pediatric, thoracic and vascular imaging, neuroradiology, nuclear medicine, as well as contrast material, computer developments, health policies and practice, and medical physics relevant to imaging.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信