基于mpMRI的GPT-4在自动前列腺活检决策中的表现:一项多中心证据研究。

IF 22.9 2区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Ming-Jun Shi, Zhi-Xiang Wang, Shuang-Kun Wang, Xuan-Hao Li, Yan-Lin Zhang, Ying Yan, Ran An, Li-Ning Dong, Lei Qiu, Tian Tian, Jia-Xin Liu, Hong-Chen Song, Ya-Fan Wang, Che Deng, Zi-Bing Cao, Hong-Yin Wang, Zheng Wang, Wei Wei, Jian Song, Jian Lu, Xuan Wei, Zhen-Chang Wang
{"title":"基于mpMRI的GPT-4在自动前列腺活检决策中的表现:一项多中心证据研究。","authors":"Ming-Jun Shi, Zhi-Xiang Wang, Shuang-Kun Wang, Xuan-Hao Li, Yan-Lin Zhang, Ying Yan, Ran An, Li-Ning Dong, Lei Qiu, Tian Tian, Jia-Xin Liu, Hong-Chen Song, Ya-Fan Wang, Che Deng, Zi-Bing Cao, Hong-Yin Wang, Zheng Wang, Wei Wei, Jian Song, Jian Lu, Xuan Wei, Zhen-Chang Wang","doi":"10.1186/s40779-025-00621-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Multiparametric magnetic resonance imaging (mpMRI) has significantly advanced prostate cancer (PCa) detection, yet decisions on invasive biopsy with moderate prostate imaging reporting and data system (PI-RADS) scores remain ambiguous.</p><p><strong>Methods: </strong>To explore the decision-making capacity of Generative Pretrained Transformer-4 (GPT-4) for automated prostate biopsy recommendations, we included 2299 individuals who underwent prostate biopsy from 2018 to 2023 in 3 large medical centers, with available mpMRI before biopsy and documented clinical-histopathological records. GPT-4 generated structured reports with given prompts. The performance of GPT-4 was quantified using confusion matrices, and sensitivity, specificity, as well as area under the curve were calculated. Multiple artificial evaluation procedures were conducted. Wilcoxon's rank sum test, Fisher's exact test, and Kruskal-Wallis tests were used for comparisons.</p><p><strong>Results: </strong>Utilizing the largest sample size in the Chinese population, patients with moderate PI-RADS scores (scores 3 and 4) accounted for 39.7% (912/2299), defined as the subset-of-interest (SOI). The detection rates of clinically significant PCa corresponding to PI-RADS scores 2-5 were 9.4, 27.3, 49.2, and 80.1%, respectively. Nearly 47.5% (433/912) of SOI patients were histopathologically proven to have undergone unnecessary prostate biopsies. With the assistance of GPT-4, 20.8% (190/912) of the SOI population could avoid unnecessary biopsies, and it performed even better [28.8% (118/410)] in the most heterogeneous subgroup of PI-RADS score 3. More than 90.0% of GPT-4 -generated reports were comprehensive and easy to understand, but less satisfied with the accuracy (82.8%). GPT-4 also demonstrated cognitive potential for handling complex problems. Additionally, the Chain of Thought method enabled us to better understand the decision-making logic behind GPT-4. Eventually, we developed a ProstAIGuide platform to facilitate accessibility for both doctors and patients.</p><p><strong>Conclusions: </strong>This multi-center study highlights the clinical utility of GPT-4 for prostate biopsy decision-making and advances our understanding of the latest artificial intelligence implementation in various medical scenarios.</p>","PeriodicalId":18581,"journal":{"name":"Military Medical Research","volume":"12 1","pages":"33"},"PeriodicalIF":22.9000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232764/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance of GPT-4 for automated prostate biopsy decision-making based on mpMRI: a multi-center evidence study.\",\"authors\":\"Ming-Jun Shi, Zhi-Xiang Wang, Shuang-Kun Wang, Xuan-Hao Li, Yan-Lin Zhang, Ying Yan, Ran An, Li-Ning Dong, Lei Qiu, Tian Tian, Jia-Xin Liu, Hong-Chen Song, Ya-Fan Wang, Che Deng, Zi-Bing Cao, Hong-Yin Wang, Zheng Wang, Wei Wei, Jian Song, Jian Lu, Xuan Wei, Zhen-Chang Wang\",\"doi\":\"10.1186/s40779-025-00621-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Multiparametric magnetic resonance imaging (mpMRI) has significantly advanced prostate cancer (PCa) detection, yet decisions on invasive biopsy with moderate prostate imaging reporting and data system (PI-RADS) scores remain ambiguous.</p><p><strong>Methods: </strong>To explore the decision-making capacity of Generative Pretrained Transformer-4 (GPT-4) for automated prostate biopsy recommendations, we included 2299 individuals who underwent prostate biopsy from 2018 to 2023 in 3 large medical centers, with available mpMRI before biopsy and documented clinical-histopathological records. GPT-4 generated structured reports with given prompts. The performance of GPT-4 was quantified using confusion matrices, and sensitivity, specificity, as well as area under the curve were calculated. Multiple artificial evaluation procedures were conducted. Wilcoxon's rank sum test, Fisher's exact test, and Kruskal-Wallis tests were used for comparisons.</p><p><strong>Results: </strong>Utilizing the largest sample size in the Chinese population, patients with moderate PI-RADS scores (scores 3 and 4) accounted for 39.7% (912/2299), defined as the subset-of-interest (SOI). The detection rates of clinically significant PCa corresponding to PI-RADS scores 2-5 were 9.4, 27.3, 49.2, and 80.1%, respectively. Nearly 47.5% (433/912) of SOI patients were histopathologically proven to have undergone unnecessary prostate biopsies. With the assistance of GPT-4, 20.8% (190/912) of the SOI population could avoid unnecessary biopsies, and it performed even better [28.8% (118/410)] in the most heterogeneous subgroup of PI-RADS score 3. More than 90.0% of GPT-4 -generated reports were comprehensive and easy to understand, but less satisfied with the accuracy (82.8%). GPT-4 also demonstrated cognitive potential for handling complex problems. Additionally, the Chain of Thought method enabled us to better understand the decision-making logic behind GPT-4. Eventually, we developed a ProstAIGuide platform to facilitate accessibility for both doctors and patients.</p><p><strong>Conclusions: </strong>This multi-center study highlights the clinical utility of GPT-4 for prostate biopsy decision-making and advances our understanding of the latest artificial intelligence implementation in various medical scenarios.</p>\",\"PeriodicalId\":18581,\"journal\":{\"name\":\"Military Medical Research\",\"volume\":\"12 1\",\"pages\":\"33\"},\"PeriodicalIF\":22.9000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232764/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Military Medical Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s40779-025-00621-3\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Military Medical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40779-025-00621-3","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

背景:多参数磁共振成像(mpMRI)具有显著的前列腺癌(PCa)检测进展,但浸润性活检与中等前列腺成像报告和数据系统(PI-RADS)评分的决定仍然不明确。方法:为了探索生成预训练转换器-4 (GPT-4)在自动前列腺活检建议方面的决策能力,我们纳入了2018年至2023年在3个大型医疗中心进行前列腺活检的2299名患者,活检前使用mpMRI并记录临床组织病理学记录。GPT-4使用给定的提示生成结构化报告。使用混淆矩阵对GPT-4的性能进行量化,计算灵敏度、特异度和曲线下面积。进行了多次人工评价程序。采用Wilcoxon秩和检验、Fisher精确检验和Kruskal-Wallis检验进行比较。结果:在中国人群中使用最大的样本量,PI-RADS评分为3分和4分的患者占39.7%(912/2299),定义为兴趣子集(SOI)。PI-RADS评分2-5对应的临床显著性PCa检出率分别为9.4、27.3%、49.2%和80.1%。近47.5%(433/912)的SOI患者经组织病理学证实进行了不必要的前列腺活检。在GPT-4的帮助下,20.8%(190/912)的SOI人群可以避免不必要的活检,在PI-RADS评分为3分的最异质性亚组中,GPT-4的效果更好[28.8%(118/410)]。90.0%以上的GPT-4生成的报告是全面和易于理解的,但对准确性的满意度较低(82.8%)。GPT-4还显示了处理复杂问题的认知潜力。此外,思维链方法使我们更好地理解GPT-4背后的决策逻辑。最终,我们开发了ProstAIGuide平台,方便医生和患者使用。结论:这项多中心研究强调了GPT-4在前列腺活检决策中的临床应用,并促进了我们对各种医疗场景中最新人工智能实施的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance of GPT-4 for automated prostate biopsy decision-making based on mpMRI: a multi-center evidence study.

Background: Multiparametric magnetic resonance imaging (mpMRI) has significantly advanced prostate cancer (PCa) detection, yet decisions on invasive biopsy with moderate prostate imaging reporting and data system (PI-RADS) scores remain ambiguous.

Methods: To explore the decision-making capacity of Generative Pretrained Transformer-4 (GPT-4) for automated prostate biopsy recommendations, we included 2299 individuals who underwent prostate biopsy from 2018 to 2023 in 3 large medical centers, with available mpMRI before biopsy and documented clinical-histopathological records. GPT-4 generated structured reports with given prompts. The performance of GPT-4 was quantified using confusion matrices, and sensitivity, specificity, as well as area under the curve were calculated. Multiple artificial evaluation procedures were conducted. Wilcoxon's rank sum test, Fisher's exact test, and Kruskal-Wallis tests were used for comparisons.

Results: Utilizing the largest sample size in the Chinese population, patients with moderate PI-RADS scores (scores 3 and 4) accounted for 39.7% (912/2299), defined as the subset-of-interest (SOI). The detection rates of clinically significant PCa corresponding to PI-RADS scores 2-5 were 9.4, 27.3, 49.2, and 80.1%, respectively. Nearly 47.5% (433/912) of SOI patients were histopathologically proven to have undergone unnecessary prostate biopsies. With the assistance of GPT-4, 20.8% (190/912) of the SOI population could avoid unnecessary biopsies, and it performed even better [28.8% (118/410)] in the most heterogeneous subgroup of PI-RADS score 3. More than 90.0% of GPT-4 -generated reports were comprehensive and easy to understand, but less satisfied with the accuracy (82.8%). GPT-4 also demonstrated cognitive potential for handling complex problems. Additionally, the Chain of Thought method enabled us to better understand the decision-making logic behind GPT-4. Eventually, we developed a ProstAIGuide platform to facilitate accessibility for both doctors and patients.

Conclusions: This multi-center study highlights the clinical utility of GPT-4 for prostate biopsy decision-making and advances our understanding of the latest artificial intelligence implementation in various medical scenarios.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Military Medical Research
Military Medical Research Medicine-General Medicine
CiteScore
38.40
自引率
2.80%
发文量
485
审稿时长
8 weeks
期刊介绍: Military Medical Research is an open-access, peer-reviewed journal that aims to share the most up-to-date evidence and innovative discoveries in a wide range of fields, including basic and clinical sciences, translational research, precision medicine, emerging interdisciplinary subjects, and advanced technologies. Our primary focus is on modern military medicine; however, we also encourage submissions from other related areas. This includes, but is not limited to, basic medical research with the potential for translation into practice, as well as clinical research that could impact medical care both in times of warfare and during peacetime military operations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信