{"title":"评估皮肤病学中GPT-4视力的诊断和治疗能力:一项初步研究。","authors":"Abhinav Pillai, Sharon Parappally-Joseph, Jason Kreutz, Danya Traboulsi, Maharshi Gandhi, Jori Hardin","doi":"10.1177/12034754251336238","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The integration of generative artificial intelligence within dermatology presents a new frontier for enhancing diagnostic accuracy and treatment planning.</p><p><strong>Objective: </strong>This research evaluates Generative Pre-trained Transformer-4 Vision's (GPT-4V) performance in accurately diagnosing and generating treatment plans for common dermatological conditions, comparing its assessment of textual versus image data and its performance with multimodal inputs.</p><p><strong>Methods: </strong>A dataset of 102 images representing 9 common dermatological conditions was compiled from dermatlas.org and dermnet.nz. Images were screened by 2 board-certified dermatologists and were excluded if they did not represent a classic presentation of the respective conditions. Fifty-four images were included in the final analysis. In addition, 9 text-based clinical scenarios corresponding to each condition were developed. GPT-4V's diagnostic capabilities were assessed across 3 setups: Image Prompt, Scenario Prompt, and Image + Scenario Prompt.</p><p><strong>Results: </strong>In the Image Prompt setup, GPT-4V correctly identified the primary diagnosis for 54% of the images. The Scenario Prompt and the Image + Scenario Prompt setups, respectively, both achieved an 89% accuracy rate in identifying the primary diagnosis. Treatment recommendations were evaluated using a modified Entrustment Scale, showing competent but not expert-level performance. A Wilcoxon signed-rank test demonstrated a statistically significant difference in treatment recommendations based on the Entrustment Score, with the model performing better in the Image + Scenario setup (<i>P</i> < .01).</p><p><strong>Conclusion: </strong>GPT-4V demonstrates the potential to augment dermatological diagnosis and treatment recommendations, particularly in text-based scenarios. However, its underwhelming performance in image-based diagnosis and integration of multimodal data highlights important areas for improvement.</p>","PeriodicalId":15403,"journal":{"name":"Journal of Cutaneous Medicine and Surgery","volume":" ","pages":"12034754251336238"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Diagnostic and Treatment Capabilities of GPT-4 Vision in Dermatology: A Pilot Study.\",\"authors\":\"Abhinav Pillai, Sharon Parappally-Joseph, Jason Kreutz, Danya Traboulsi, Maharshi Gandhi, Jori Hardin\",\"doi\":\"10.1177/12034754251336238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The integration of generative artificial intelligence within dermatology presents a new frontier for enhancing diagnostic accuracy and treatment planning.</p><p><strong>Objective: </strong>This research evaluates Generative Pre-trained Transformer-4 Vision's (GPT-4V) performance in accurately diagnosing and generating treatment plans for common dermatological conditions, comparing its assessment of textual versus image data and its performance with multimodal inputs.</p><p><strong>Methods: </strong>A dataset of 102 images representing 9 common dermatological conditions was compiled from dermatlas.org and dermnet.nz. Images were screened by 2 board-certified dermatologists and were excluded if they did not represent a classic presentation of the respective conditions. Fifty-four images were included in the final analysis. In addition, 9 text-based clinical scenarios corresponding to each condition were developed. GPT-4V's diagnostic capabilities were assessed across 3 setups: Image Prompt, Scenario Prompt, and Image + Scenario Prompt.</p><p><strong>Results: </strong>In the Image Prompt setup, GPT-4V correctly identified the primary diagnosis for 54% of the images. The Scenario Prompt and the Image + Scenario Prompt setups, respectively, both achieved an 89% accuracy rate in identifying the primary diagnosis. Treatment recommendations were evaluated using a modified Entrustment Scale, showing competent but not expert-level performance. A Wilcoxon signed-rank test demonstrated a statistically significant difference in treatment recommendations based on the Entrustment Score, with the model performing better in the Image + Scenario setup (<i>P</i> < .01).</p><p><strong>Conclusion: </strong>GPT-4V demonstrates the potential to augment dermatological diagnosis and treatment recommendations, particularly in text-based scenarios. However, its underwhelming performance in image-based diagnosis and integration of multimodal data highlights important areas for improvement.</p>\",\"PeriodicalId\":15403,\"journal\":{\"name\":\"Journal of Cutaneous Medicine and Surgery\",\"volume\":\" \",\"pages\":\"12034754251336238\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cutaneous Medicine and Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/12034754251336238\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DERMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cutaneous Medicine and Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/12034754251336238","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DERMATOLOGY","Score":null,"Total":0}
Evaluating the Diagnostic and Treatment Capabilities of GPT-4 Vision in Dermatology: A Pilot Study.
Background: The integration of generative artificial intelligence within dermatology presents a new frontier for enhancing diagnostic accuracy and treatment planning.
Objective: This research evaluates Generative Pre-trained Transformer-4 Vision's (GPT-4V) performance in accurately diagnosing and generating treatment plans for common dermatological conditions, comparing its assessment of textual versus image data and its performance with multimodal inputs.
Methods: A dataset of 102 images representing 9 common dermatological conditions was compiled from dermatlas.org and dermnet.nz. Images were screened by 2 board-certified dermatologists and were excluded if they did not represent a classic presentation of the respective conditions. Fifty-four images were included in the final analysis. In addition, 9 text-based clinical scenarios corresponding to each condition were developed. GPT-4V's diagnostic capabilities were assessed across 3 setups: Image Prompt, Scenario Prompt, and Image + Scenario Prompt.
Results: In the Image Prompt setup, GPT-4V correctly identified the primary diagnosis for 54% of the images. The Scenario Prompt and the Image + Scenario Prompt setups, respectively, both achieved an 89% accuracy rate in identifying the primary diagnosis. Treatment recommendations were evaluated using a modified Entrustment Scale, showing competent but not expert-level performance. A Wilcoxon signed-rank test demonstrated a statistically significant difference in treatment recommendations based on the Entrustment Score, with the model performing better in the Image + Scenario setup (P < .01).
Conclusion: GPT-4V demonstrates the potential to augment dermatological diagnosis and treatment recommendations, particularly in text-based scenarios. However, its underwhelming performance in image-based diagnosis and integration of multimodal data highlights important areas for improvement.
期刊介绍:
Journal of Cutaneous Medicine and Surgery (JCMS) aims to reflect the state of the art in cutaneous biology and dermatology by providing original scientific writings, as well as a complete critical review of the dermatology literature for clinicians, trainees, and academicians. JCMS endeavours to bring readers cutting edge dermatologic information in two distinct formats. Part of each issue features scholarly research and articles on issues of basic and applied science, insightful case reports, comprehensive continuing medical education, and in depth reviews, all of which provide theoretical framework for practitioners to make sound practical decisions. The evolving field of dermatology is highlighted through these articles. In addition, part of each issue is dedicated to making the most important developments in dermatology easily accessible to the clinician by presenting well-chosen, well-written, and highly organized information in a format that is interesting, clearly presented, and useful to patient care.