Harnessing large multimodal models in pulmonary CT: the generative AI edge in lung cancer diagnostics

IF 7.6 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Lihaoyun Huang, Junyi Shen, Anqi Lin, Jian Zhang, Peng Luo, Ting Wei
{"title":"Harnessing large multimodal models in pulmonary CT: the generative AI edge in lung cancer diagnostics","authors":"Lihaoyun Huang,&nbsp;Junyi Shen,&nbsp;Anqi Lin,&nbsp;Jian Zhang,&nbsp;Peng Luo,&nbsp;Ting Wei","doi":"10.1016/j.lanwpc.2024.101336","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Generative Artificial Intelligence (Gen-AI) has rapidly advanced in multimodal information processing, particularly in medical applications such as the refinement of instruments and interpretation of medical images. However, limited evidence exists on the diagnostic performance of Gen-AI models in tumor recognition, particularly using computed tomography (CT) images. This study aimed to evaluate the diagnostic capabilities of several prevelant Gen-AI models (GPT-4-turbo, Gemini-pro-vision, Claude-3-opus) in the context of lung CT image analysis.</div></div><div><h3>Methods</h3><div>This retrospective study analyzed chest CT scans from 404 patients with lung conditions with lung neoplasms (n=184) and non-malignancy (n=210). After standardizing CT images, the diagnostic performance and reliability of three Gen-AI (GPT-4-turbo, Gemini-pro-vision, and Claude-3-opus) were assessed using chi-square tests and Receiver Operating Characteristic (ROC) curves across various clinical scenarios. Likert scale scoring and response rate analysis were employed to evaluate internal diagnostic tendencies, while regression analyses were conducted for model optimization.</div></div><div><h3>Findings</h3><div>In a cueing environment limited to a single CT image, Gemini demonstrated the highest diagnostic accuracy (92.21%), followed by Claude (91.49%), while GPT exhibited the lowest performance (65.22%). As the complexity of the cueing environment increased, all models experienced a decline in diagnostic accuracy. Claude showed a marginal decrease, whereas Gemini's accuracy fluctuated significantly. Under simplified cueing conditions, the performance of all models improved notably (Gemini AUC = 0.76, Claude AUC = 0.69, GPT AUC = 0.73). Feature identification analysis revealed that Claude and GPT excelled in recognizing key features, particularly prioritizing “Morphology/Margins” when diagnosing primary malignancies, with “spiculated” and “irregular” serving as critical indicators. However, in cases of misdiagnosis or missed diagnoses, Gen-AI exhibited significant deviations across multiple feature dimensions—some even completely contradicted the actual findings. Following optimization through Lasso and stepwise regression, the diagnostic performance of the models was significantly enhanced (AUC = 0.896 and AUC = 0.894, respectively).</div></div><div><h3>Interpretation</h3><div>Gen-AI shows promising potential in pulmonary CT imaging, particularly in simplified diagnostic settings. However, their limitations in processing complex multi-modal information highlight significant challenges for clinical integration. Ongoing efforts to improve the robustness and reliability of these models are crucial for their successful adoption in healthcare.</div></div>","PeriodicalId":22792,"journal":{"name":"The Lancet Regional Health: Western Pacific","volume":"55 ","pages":"Article 101336"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Regional Health: Western Pacific","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666606524003304","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Generative Artificial Intelligence (Gen-AI) has rapidly advanced in multimodal information processing, particularly in medical applications such as the refinement of instruments and interpretation of medical images. However, limited evidence exists on the diagnostic performance of Gen-AI models in tumor recognition, particularly using computed tomography (CT) images. This study aimed to evaluate the diagnostic capabilities of several prevelant Gen-AI models (GPT-4-turbo, Gemini-pro-vision, Claude-3-opus) in the context of lung CT image analysis.

Methods

This retrospective study analyzed chest CT scans from 404 patients with lung conditions with lung neoplasms (n=184) and non-malignancy (n=210). After standardizing CT images, the diagnostic performance and reliability of three Gen-AI (GPT-4-turbo, Gemini-pro-vision, and Claude-3-opus) were assessed using chi-square tests and Receiver Operating Characteristic (ROC) curves across various clinical scenarios. Likert scale scoring and response rate analysis were employed to evaluate internal diagnostic tendencies, while regression analyses were conducted for model optimization.

Findings

In a cueing environment limited to a single CT image, Gemini demonstrated the highest diagnostic accuracy (92.21%), followed by Claude (91.49%), while GPT exhibited the lowest performance (65.22%). As the complexity of the cueing environment increased, all models experienced a decline in diagnostic accuracy. Claude showed a marginal decrease, whereas Gemini's accuracy fluctuated significantly. Under simplified cueing conditions, the performance of all models improved notably (Gemini AUC = 0.76, Claude AUC = 0.69, GPT AUC = 0.73). Feature identification analysis revealed that Claude and GPT excelled in recognizing key features, particularly prioritizing “Morphology/Margins” when diagnosing primary malignancies, with “spiculated” and “irregular” serving as critical indicators. However, in cases of misdiagnosis or missed diagnoses, Gen-AI exhibited significant deviations across multiple feature dimensions—some even completely contradicted the actual findings. Following optimization through Lasso and stepwise regression, the diagnostic performance of the models was significantly enhanced (AUC = 0.896 and AUC = 0.894, respectively).

Interpretation

Gen-AI shows promising potential in pulmonary CT imaging, particularly in simplified diagnostic settings. However, their limitations in processing complex multi-modal information highlight significant challenges for clinical integration. Ongoing efforts to improve the robustness and reliability of these models are crucial for their successful adoption in healthcare.
求助全文
约1分钟内获得全文 求助全文
来源期刊
The Lancet Regional Health: Western Pacific
The Lancet Regional Health: Western Pacific Medicine-Pediatrics, Perinatology and Child Health
CiteScore
8.80
自引率
2.80%
发文量
305
审稿时长
11 weeks
期刊介绍: The Lancet Regional Health – Western Pacific, a gold open access journal, is an integral part of The Lancet's global initiative advocating for healthcare quality and access worldwide. It aims to advance clinical practice and health policy in the Western Pacific region, contributing to enhanced health outcomes. The journal publishes high-quality original research shedding light on clinical practice and health policy in the region. It also includes reviews, commentaries, and opinion pieces covering diverse regional health topics, such as infectious diseases, non-communicable diseases, child and adolescent health, maternal and reproductive health, aging health, mental health, the health workforce and systems, and health policy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信