Ming-Jun Shi, Zhi-Xiang Wang, Shuang-Kun Wang, Xuan-Hao Li, Yan-Lin Zhang, Ying Yan, Ran An, Li-Ning Dong, Lei Qiu, Tian Tian, Jia-Xin Liu, Hong-Chen Song, Ya-Fan Wang, Che Deng, Zi-Bing Cao, Hong-Yin Wang, Zheng Wang, Wei Wei, Jian Song, Jian Lu, Xuan Wei, Zhen-Chang Wang
{"title":"基于mpMRI的GPT-4在自动前列腺活检决策中的表现:一项多中心证据研究。","authors":"Ming-Jun Shi, Zhi-Xiang Wang, Shuang-Kun Wang, Xuan-Hao Li, Yan-Lin Zhang, Ying Yan, Ran An, Li-Ning Dong, Lei Qiu, Tian Tian, Jia-Xin Liu, Hong-Chen Song, Ya-Fan Wang, Che Deng, Zi-Bing Cao, Hong-Yin Wang, Zheng Wang, Wei Wei, Jian Song, Jian Lu, Xuan Wei, Zhen-Chang Wang","doi":"10.1186/s40779-025-00621-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Multiparametric magnetic resonance imaging (mpMRI) has significantly advanced prostate cancer (PCa) detection, yet decisions on invasive biopsy with moderate prostate imaging reporting and data system (PI-RADS) scores remain ambiguous.</p><p><strong>Methods: </strong>To explore the decision-making capacity of Generative Pretrained Transformer-4 (GPT-4) for automated prostate biopsy recommendations, we included 2299 individuals who underwent prostate biopsy from 2018 to 2023 in 3 large medical centers, with available mpMRI before biopsy and documented clinical-histopathological records. GPT-4 generated structured reports with given prompts. The performance of GPT-4 was quantified using confusion matrices, and sensitivity, specificity, as well as area under the curve were calculated. Multiple artificial evaluation procedures were conducted. Wilcoxon's rank sum test, Fisher's exact test, and Kruskal-Wallis tests were used for comparisons.</p><p><strong>Results: </strong>Utilizing the largest sample size in the Chinese population, patients with moderate PI-RADS scores (scores 3 and 4) accounted for 39.7% (912/2299), defined as the subset-of-interest (SOI). The detection rates of clinically significant PCa corresponding to PI-RADS scores 2-5 were 9.4, 27.3, 49.2, and 80.1%, respectively. Nearly 47.5% (433/912) of SOI patients were histopathologically proven to have undergone unnecessary prostate biopsies. With the assistance of GPT-4, 20.8% (190/912) of the SOI population could avoid unnecessary biopsies, and it performed even better [28.8% (118/410)] in the most heterogeneous subgroup of PI-RADS score 3. More than 90.0% of GPT-4 -generated reports were comprehensive and easy to understand, but less satisfied with the accuracy (82.8%). GPT-4 also demonstrated cognitive potential for handling complex problems. Additionally, the Chain of Thought method enabled us to better understand the decision-making logic behind GPT-4. Eventually, we developed a ProstAIGuide platform to facilitate accessibility for both doctors and patients.</p><p><strong>Conclusions: </strong>This multi-center study highlights the clinical utility of GPT-4 for prostate biopsy decision-making and advances our understanding of the latest artificial intelligence implementation in various medical scenarios.</p>","PeriodicalId":18581,"journal":{"name":"Military Medical Research","volume":"12 1","pages":"33"},"PeriodicalIF":22.9000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232764/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance of GPT-4 for automated prostate biopsy decision-making based on mpMRI: a multi-center evidence study.\",\"authors\":\"Ming-Jun Shi, Zhi-Xiang Wang, Shuang-Kun Wang, Xuan-Hao Li, Yan-Lin Zhang, Ying Yan, Ran An, Li-Ning Dong, Lei Qiu, Tian Tian, Jia-Xin Liu, Hong-Chen Song, Ya-Fan Wang, Che Deng, Zi-Bing Cao, Hong-Yin Wang, Zheng Wang, Wei Wei, Jian Song, Jian Lu, Xuan Wei, Zhen-Chang Wang\",\"doi\":\"10.1186/s40779-025-00621-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Multiparametric magnetic resonance imaging (mpMRI) has significantly advanced prostate cancer (PCa) detection, yet decisions on invasive biopsy with moderate prostate imaging reporting and data system (PI-RADS) scores remain ambiguous.</p><p><strong>Methods: </strong>To explore the decision-making capacity of Generative Pretrained Transformer-4 (GPT-4) for automated prostate biopsy recommendations, we included 2299 individuals who underwent prostate biopsy from 2018 to 2023 in 3 large medical centers, with available mpMRI before biopsy and documented clinical-histopathological records. GPT-4 generated structured reports with given prompts. The performance of GPT-4 was quantified using confusion matrices, and sensitivity, specificity, as well as area under the curve were calculated. Multiple artificial evaluation procedures were conducted. Wilcoxon's rank sum test, Fisher's exact test, and Kruskal-Wallis tests were used for comparisons.</p><p><strong>Results: </strong>Utilizing the largest sample size in the Chinese population, patients with moderate PI-RADS scores (scores 3 and 4) accounted for 39.7% (912/2299), defined as the subset-of-interest (SOI). The detection rates of clinically significant PCa corresponding to PI-RADS scores 2-5 were 9.4, 27.3, 49.2, and 80.1%, respectively. Nearly 47.5% (433/912) of SOI patients were histopathologically proven to have undergone unnecessary prostate biopsies. With the assistance of GPT-4, 20.8% (190/912) of the SOI population could avoid unnecessary biopsies, and it performed even better [28.8% (118/410)] in the most heterogeneous subgroup of PI-RADS score 3. More than 90.0% of GPT-4 -generated reports were comprehensive and easy to understand, but less satisfied with the accuracy (82.8%). GPT-4 also demonstrated cognitive potential for handling complex problems. Additionally, the Chain of Thought method enabled us to better understand the decision-making logic behind GPT-4. Eventually, we developed a ProstAIGuide platform to facilitate accessibility for both doctors and patients.</p><p><strong>Conclusions: </strong>This multi-center study highlights the clinical utility of GPT-4 for prostate biopsy decision-making and advances our understanding of the latest artificial intelligence implementation in various medical scenarios.</p>\",\"PeriodicalId\":18581,\"journal\":{\"name\":\"Military Medical Research\",\"volume\":\"12 1\",\"pages\":\"33\"},\"PeriodicalIF\":22.9000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232764/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Military Medical Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s40779-025-00621-3\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Military Medical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40779-025-00621-3","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
Performance of GPT-4 for automated prostate biopsy decision-making based on mpMRI: a multi-center evidence study.
Background: Multiparametric magnetic resonance imaging (mpMRI) has significantly advanced prostate cancer (PCa) detection, yet decisions on invasive biopsy with moderate prostate imaging reporting and data system (PI-RADS) scores remain ambiguous.
Methods: To explore the decision-making capacity of Generative Pretrained Transformer-4 (GPT-4) for automated prostate biopsy recommendations, we included 2299 individuals who underwent prostate biopsy from 2018 to 2023 in 3 large medical centers, with available mpMRI before biopsy and documented clinical-histopathological records. GPT-4 generated structured reports with given prompts. The performance of GPT-4 was quantified using confusion matrices, and sensitivity, specificity, as well as area under the curve were calculated. Multiple artificial evaluation procedures were conducted. Wilcoxon's rank sum test, Fisher's exact test, and Kruskal-Wallis tests were used for comparisons.
Results: Utilizing the largest sample size in the Chinese population, patients with moderate PI-RADS scores (scores 3 and 4) accounted for 39.7% (912/2299), defined as the subset-of-interest (SOI). The detection rates of clinically significant PCa corresponding to PI-RADS scores 2-5 were 9.4, 27.3, 49.2, and 80.1%, respectively. Nearly 47.5% (433/912) of SOI patients were histopathologically proven to have undergone unnecessary prostate biopsies. With the assistance of GPT-4, 20.8% (190/912) of the SOI population could avoid unnecessary biopsies, and it performed even better [28.8% (118/410)] in the most heterogeneous subgroup of PI-RADS score 3. More than 90.0% of GPT-4 -generated reports were comprehensive and easy to understand, but less satisfied with the accuracy (82.8%). GPT-4 also demonstrated cognitive potential for handling complex problems. Additionally, the Chain of Thought method enabled us to better understand the decision-making logic behind GPT-4. Eventually, we developed a ProstAIGuide platform to facilitate accessibility for both doctors and patients.
Conclusions: This multi-center study highlights the clinical utility of GPT-4 for prostate biopsy decision-making and advances our understanding of the latest artificial intelligence implementation in various medical scenarios.
期刊介绍:
Military Medical Research is an open-access, peer-reviewed journal that aims to share the most up-to-date evidence and innovative discoveries in a wide range of fields, including basic and clinical sciences, translational research, precision medicine, emerging interdisciplinary subjects, and advanced technologies. Our primary focus is on modern military medicine; however, we also encourage submissions from other related areas. This includes, but is not limited to, basic medical research with the potential for translation into practice, as well as clinical research that could impact medical care both in times of warfare and during peacetime military operations.