Deniz Esin Tekcan Sanli, Ahmet Necati Sanli, Yildiz Buyukdereli Atadag, Atakan Kurt, Emel Esmerer
{"title":"乳腺超声成像中的gpt - 40和专业人工智能:准确性、一致性、局限性和诊断潜力的比较研究。","authors":"Deniz Esin Tekcan Sanli, Ahmet Necati Sanli, Yildiz Buyukdereli Atadag, Atakan Kurt, Emel Esmerer","doi":"10.1002/jum.16749","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to evaluate the ability of ChatGPT and Breast Ultrasound Helper, a special ChatGPT-based subprogram trained on ultrasound image analysis, to analyze and differentiate benign and malignant breast lesions on ultrasound images.</p><p><strong>Methods: </strong>Ultrasound images of histopathologically confirmed breast cancer and fibroadenoma patients were read GPT-4o (the latest ChatGPT version) and Breast Ultrasound Helper (BUH), a tool from the \"Explore\" section of ChatGPT. Both were prompted in English using ACR BI-RADS Breast Ultrasound Lexicon criteria: lesion shape, orientation, margin, internal echo pattern, echogenicity, posterior acoustic features, microcalcifications or hyperechoic foci, perilesional hyperechoic rim, edema or architectural distortion, lesion size, and BI-RADS category. Two experienced radiologists evaluated the images and the responses of the programs in consensus. The outputs, BI-RADS category agreement, and benign/malignant discrimination were statistically compared.</p><p><strong>Results: </strong>A total of 232 ultrasound images were analyzed, of which 133 (57.3%) were malignant and 99 (42.7%) benign. In comparative analysis, BUH showed superior performance overall, with higher kappa values and statistically significant results across multiple features (P .001). However, the overall level of agreement with the radiologists' consensus for all features was similar for BUH (κ: 0.387-0.755) and GPT-4o (κ: 0.317-0.803). On the other hand, BI-RADS category agreement was slightly higher in GPT-4o than in BUH (69.4% versus 65.9%), but BUH was slightly more successful in distinguishing benign lesions from malignant lesions (65.9% versus 67.7%).</p><p><strong>Conclusions: </strong>Although both AI tools show moderate-good performance in ultrasound image analysis, their limited compatibility with radiologists' evaluations and BI-RADS categorization suggests that their clinical application in breast ultrasound interpretation is still early and unreliable.</p>","PeriodicalId":17563,"journal":{"name":"Journal of Ultrasound in Medicine","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GPT-4o and Specialized AI in Breast Ultrasound Imaging: A comparative Study on Accuracy, Agreement, Limitations, and Diagnostic Potential.\",\"authors\":\"Deniz Esin Tekcan Sanli, Ahmet Necati Sanli, Yildiz Buyukdereli Atadag, Atakan Kurt, Emel Esmerer\",\"doi\":\"10.1002/jum.16749\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>This study aimed to evaluate the ability of ChatGPT and Breast Ultrasound Helper, a special ChatGPT-based subprogram trained on ultrasound image analysis, to analyze and differentiate benign and malignant breast lesions on ultrasound images.</p><p><strong>Methods: </strong>Ultrasound images of histopathologically confirmed breast cancer and fibroadenoma patients were read GPT-4o (the latest ChatGPT version) and Breast Ultrasound Helper (BUH), a tool from the \\\"Explore\\\" section of ChatGPT. Both were prompted in English using ACR BI-RADS Breast Ultrasound Lexicon criteria: lesion shape, orientation, margin, internal echo pattern, echogenicity, posterior acoustic features, microcalcifications or hyperechoic foci, perilesional hyperechoic rim, edema or architectural distortion, lesion size, and BI-RADS category. Two experienced radiologists evaluated the images and the responses of the programs in consensus. The outputs, BI-RADS category agreement, and benign/malignant discrimination were statistically compared.</p><p><strong>Results: </strong>A total of 232 ultrasound images were analyzed, of which 133 (57.3%) were malignant and 99 (42.7%) benign. In comparative analysis, BUH showed superior performance overall, with higher kappa values and statistically significant results across multiple features (P .001). However, the overall level of agreement with the radiologists' consensus for all features was similar for BUH (κ: 0.387-0.755) and GPT-4o (κ: 0.317-0.803). On the other hand, BI-RADS category agreement was slightly higher in GPT-4o than in BUH (69.4% versus 65.9%), but BUH was slightly more successful in distinguishing benign lesions from malignant lesions (65.9% versus 67.7%).</p><p><strong>Conclusions: </strong>Although both AI tools show moderate-good performance in ultrasound image analysis, their limited compatibility with radiologists' evaluations and BI-RADS categorization suggests that their clinical application in breast ultrasound interpretation is still early and unreliable.</p>\",\"PeriodicalId\":17563,\"journal\":{\"name\":\"Journal of Ultrasound in Medicine\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Ultrasound in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/jum.16749\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ultrasound in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/jum.16749","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
GPT-4o and Specialized AI in Breast Ultrasound Imaging: A comparative Study on Accuracy, Agreement, Limitations, and Diagnostic Potential.
Objectives: This study aimed to evaluate the ability of ChatGPT and Breast Ultrasound Helper, a special ChatGPT-based subprogram trained on ultrasound image analysis, to analyze and differentiate benign and malignant breast lesions on ultrasound images.
Methods: Ultrasound images of histopathologically confirmed breast cancer and fibroadenoma patients were read GPT-4o (the latest ChatGPT version) and Breast Ultrasound Helper (BUH), a tool from the "Explore" section of ChatGPT. Both were prompted in English using ACR BI-RADS Breast Ultrasound Lexicon criteria: lesion shape, orientation, margin, internal echo pattern, echogenicity, posterior acoustic features, microcalcifications or hyperechoic foci, perilesional hyperechoic rim, edema or architectural distortion, lesion size, and BI-RADS category. Two experienced radiologists evaluated the images and the responses of the programs in consensus. The outputs, BI-RADS category agreement, and benign/malignant discrimination were statistically compared.
Results: A total of 232 ultrasound images were analyzed, of which 133 (57.3%) were malignant and 99 (42.7%) benign. In comparative analysis, BUH showed superior performance overall, with higher kappa values and statistically significant results across multiple features (P .001). However, the overall level of agreement with the radiologists' consensus for all features was similar for BUH (κ: 0.387-0.755) and GPT-4o (κ: 0.317-0.803). On the other hand, BI-RADS category agreement was slightly higher in GPT-4o than in BUH (69.4% versus 65.9%), but BUH was slightly more successful in distinguishing benign lesions from malignant lesions (65.9% versus 67.7%).
Conclusions: Although both AI tools show moderate-good performance in ultrasound image analysis, their limited compatibility with radiologists' evaluations and BI-RADS categorization suggests that their clinical application in breast ultrasound interpretation is still early and unreliable.
期刊介绍:
The Journal of Ultrasound in Medicine (JUM) is dedicated to the rapid, accurate publication of original articles dealing with all aspects of medical ultrasound, particularly its direct application to patient care but also relevant basic science, advances in instrumentation, and biological effects. The journal is an official publication of the American Institute of Ultrasound in Medicine and publishes articles in a variety of categories, including Original Research papers, Review Articles, Pictorial Essays, Technical Innovations, Case Series, Letters to the Editor, and more, from an international bevy of countries in a continual effort to showcase and promote advances in the ultrasound community.
Represented through these efforts are a wide variety of disciplines of ultrasound, including, but not limited to:
-Basic Science-
Breast Ultrasound-
Contrast-Enhanced Ultrasound-
Dermatology-
Echocardiography-
Elastography-
Emergency Medicine-
Fetal Echocardiography-
Gastrointestinal Ultrasound-
General and Abdominal Ultrasound-
Genitourinary Ultrasound-
Gynecologic Ultrasound-
Head and Neck Ultrasound-
High Frequency Clinical and Preclinical Imaging-
Interventional-Intraoperative Ultrasound-
Musculoskeletal Ultrasound-
Neurosonology-
Obstetric Ultrasound-
Ophthalmologic Ultrasound-
Pediatric Ultrasound-
Point-of-Care Ultrasound-
Public Policy-
Superficial Structures-
Therapeutic Ultrasound-
Ultrasound Education-
Ultrasound in Global Health-
Urologic Ultrasound-
Vascular Ultrasound