Jonah W Perlmutter, John Milkovich, Sierra Fremont, Shaishav Datta, Adam Mosa
{"title":"Beyond the Surface: Assessing GPT-4's Accuracy in Detecting Melanoma and Suspicious Skin Lesions From Dermoscopic Images.","authors":"Jonah W Perlmutter, John Milkovich, Sierra Fremont, Shaishav Datta, Adam Mosa","doi":"10.1177/22925503251315489","DOIUrl":null,"url":null,"abstract":"<p><p><b>Introduction:</b> Self-examinations for skin cancer detection are limited by sensitivity. ChatGPT-4 has image recognition capabilities that can be a useful adjunct for screening cancers and tele-health applications. This study investigated the efficacy of ChatGPT-4 in identifying skin lesions. <b>Methods:</b> Dermoscopic images were retrospectively selected from the PH<sup>2</sup> dataset, categorized by clinical diagnosis, and uploaded to ChatGPT-4 with a predesigned prompt. Responses were compared against clinical diagnoses. Confidence intervals were calculated using the bootstrap method assessing precision and significance was calculated using McNemar's test. Analyses were performed using Jupyter Notebook and Python. <b>Results:</b> The GPT-4 model showed moderate performance in melanoma detection with 68.5% accuracy, 52.5% sensitivity, and 72.5% specificity, significantly differing from the clinical standard (<i>P</i> = .002). For suspicious lesion detection, it performed better with 68.0% accuracy, 78.0% precision, and 70.0% F-measure, still not closely matching clinical diagnosis for atypical nevi and melanoma (<i>P</i> = .0169). <b>Conclusion:</b> The statistical difference between ChatGPT-4 diagnosis of melanoma and suspicious lesions compared with clinical diagnoses and other AI models suggests the need for improvement in ChatGPT-4 algorithms. This study's limitations included the use of a secondary care database with a higher melanoma incidence, high-quality dermoscopic images that limit generalizability, a small sample size lacking diversity, and the need for larger datasets to validate findings in broader contexts.</p>","PeriodicalId":20206,"journal":{"name":"Plastic surgery","volume":" ","pages":"22925503251315489"},"PeriodicalIF":0.7000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11836967/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plastic surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/22925503251315489","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Self-examinations for skin cancer detection are limited by sensitivity. ChatGPT-4 has image recognition capabilities that can be a useful adjunct for screening cancers and tele-health applications. This study investigated the efficacy of ChatGPT-4 in identifying skin lesions. Methods: Dermoscopic images were retrospectively selected from the PH2 dataset, categorized by clinical diagnosis, and uploaded to ChatGPT-4 with a predesigned prompt. Responses were compared against clinical diagnoses. Confidence intervals were calculated using the bootstrap method assessing precision and significance was calculated using McNemar's test. Analyses were performed using Jupyter Notebook and Python. Results: The GPT-4 model showed moderate performance in melanoma detection with 68.5% accuracy, 52.5% sensitivity, and 72.5% specificity, significantly differing from the clinical standard (P = .002). For suspicious lesion detection, it performed better with 68.0% accuracy, 78.0% precision, and 70.0% F-measure, still not closely matching clinical diagnosis for atypical nevi and melanoma (P = .0169). Conclusion: The statistical difference between ChatGPT-4 diagnosis of melanoma and suspicious lesions compared with clinical diagnoses and other AI models suggests the need for improvement in ChatGPT-4 algorithms. This study's limitations included the use of a secondary care database with a higher melanoma incidence, high-quality dermoscopic images that limit generalizability, a small sample size lacking diversity, and the need for larger datasets to validate findings in broader contexts.
期刊介绍:
Plastic Surgery (Chirurgie Plastique) is the official journal of the Canadian Society of Plastic Surgeons, the Canadian Society for Aesthetic Plastic Surgery, Group for the Advancement of Microsurgery, and the Canadian Society for Surgery of the Hand. It serves as a major venue for Canadian research, society guidelines, and continuing medical education.