评估皮肤病学中GPT-4视力的诊断和治疗能力:一项初步研究。

IF 3.1 4区 医学 Q2 DERMATOLOGY
Abhinav Pillai, Sharon Parappally-Joseph, Jason Kreutz, Danya Traboulsi, Maharshi Gandhi, Jori Hardin
{"title":"评估皮肤病学中GPT-4视力的诊断和治疗能力:一项初步研究。","authors":"Abhinav Pillai, Sharon Parappally-Joseph, Jason Kreutz, Danya Traboulsi, Maharshi Gandhi, Jori Hardin","doi":"10.1177/12034754251336238","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The integration of generative artificial intelligence within dermatology presents a new frontier for enhancing diagnostic accuracy and treatment planning.</p><p><strong>Objective: </strong>This research evaluates Generative Pre-trained Transformer-4 Vision's (GPT-4V) performance in accurately diagnosing and generating treatment plans for common dermatological conditions, comparing its assessment of textual versus image data and its performance with multimodal inputs.</p><p><strong>Methods: </strong>A dataset of 102 images representing 9 common dermatological conditions was compiled from dermatlas.org and dermnet.nz. Images were screened by 2 board-certified dermatologists and were excluded if they did not represent a classic presentation of the respective conditions. Fifty-four images were included in the final analysis. In addition, 9 text-based clinical scenarios corresponding to each condition were developed. GPT-4V's diagnostic capabilities were assessed across 3 setups: Image Prompt, Scenario Prompt, and Image + Scenario Prompt.</p><p><strong>Results: </strong>In the Image Prompt setup, GPT-4V correctly identified the primary diagnosis for 54% of the images. The Scenario Prompt and the Image + Scenario Prompt setups, respectively, both achieved an 89% accuracy rate in identifying the primary diagnosis. Treatment recommendations were evaluated using a modified Entrustment Scale, showing competent but not expert-level performance. A Wilcoxon signed-rank test demonstrated a statistically significant difference in treatment recommendations based on the Entrustment Score, with the model performing better in the Image + Scenario setup (<i>P</i> < .01).</p><p><strong>Conclusion: </strong>GPT-4V demonstrates the potential to augment dermatological diagnosis and treatment recommendations, particularly in text-based scenarios. However, its underwhelming performance in image-based diagnosis and integration of multimodal data highlights important areas for improvement.</p>","PeriodicalId":15403,"journal":{"name":"Journal of Cutaneous Medicine and Surgery","volume":" ","pages":"12034754251336238"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Diagnostic and Treatment Capabilities of GPT-4 Vision in Dermatology: A Pilot Study.\",\"authors\":\"Abhinav Pillai, Sharon Parappally-Joseph, Jason Kreutz, Danya Traboulsi, Maharshi Gandhi, Jori Hardin\",\"doi\":\"10.1177/12034754251336238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The integration of generative artificial intelligence within dermatology presents a new frontier for enhancing diagnostic accuracy and treatment planning.</p><p><strong>Objective: </strong>This research evaluates Generative Pre-trained Transformer-4 Vision's (GPT-4V) performance in accurately diagnosing and generating treatment plans for common dermatological conditions, comparing its assessment of textual versus image data and its performance with multimodal inputs.</p><p><strong>Methods: </strong>A dataset of 102 images representing 9 common dermatological conditions was compiled from dermatlas.org and dermnet.nz. Images were screened by 2 board-certified dermatologists and were excluded if they did not represent a classic presentation of the respective conditions. Fifty-four images were included in the final analysis. In addition, 9 text-based clinical scenarios corresponding to each condition were developed. GPT-4V's diagnostic capabilities were assessed across 3 setups: Image Prompt, Scenario Prompt, and Image + Scenario Prompt.</p><p><strong>Results: </strong>In the Image Prompt setup, GPT-4V correctly identified the primary diagnosis for 54% of the images. The Scenario Prompt and the Image + Scenario Prompt setups, respectively, both achieved an 89% accuracy rate in identifying the primary diagnosis. Treatment recommendations were evaluated using a modified Entrustment Scale, showing competent but not expert-level performance. A Wilcoxon signed-rank test demonstrated a statistically significant difference in treatment recommendations based on the Entrustment Score, with the model performing better in the Image + Scenario setup (<i>P</i> < .01).</p><p><strong>Conclusion: </strong>GPT-4V demonstrates the potential to augment dermatological diagnosis and treatment recommendations, particularly in text-based scenarios. However, its underwhelming performance in image-based diagnosis and integration of multimodal data highlights important areas for improvement.</p>\",\"PeriodicalId\":15403,\"journal\":{\"name\":\"Journal of Cutaneous Medicine and Surgery\",\"volume\":\" \",\"pages\":\"12034754251336238\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cutaneous Medicine and Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/12034754251336238\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DERMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cutaneous Medicine and Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/12034754251336238","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DERMATOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:生成式人工智能在皮肤病学中的集成为提高诊断准确性和治疗计划提供了新的前沿。目的:本研究评估生成式预训练Transformer-4 Vision (GPT-4V)在准确诊断和生成常见皮肤病治疗方案方面的性能,比较其对文本和图像数据的评估以及其在多模态输入下的性能。方法:从dermatlas.org和dermnet.nz上收集了代表9种常见皮肤疾病的102张图像数据集。图像由2名委员会认证的皮肤科医生筛选,如果他们不代表各自条件的经典表现,则被排除。54幅图像被纳入最终分析。此外,我们还针对每种情况制定了9个基于文本的临床场景。GPT-4V的诊断能力通过3种设置进行评估:图像提示、场景提示和图像+场景提示。结果:在图像提示设置中,GPT-4V正确识别了54%的图像的初步诊断。场景提示和图像+场景提示设置分别在识别初级诊断方面达到89%的准确率。使用改良的委托量表对治疗建议进行评估,显示合格但不是专家水平的表现。Wilcoxon sign -rank检验显示,基于委托评分的治疗建议存在统计学显著差异,模型在图像+场景设置中表现更好(P < 0.01)。结论:GPT-4V显示了增加皮肤科诊断和治疗建议的潜力,特别是在基于文本的情况下。然而,它在基于图像的诊断和多模态数据集成方面的表现不尽如人意,这突出了需要改进的重要领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating the Diagnostic and Treatment Capabilities of GPT-4 Vision in Dermatology: A Pilot Study.

Background: The integration of generative artificial intelligence within dermatology presents a new frontier for enhancing diagnostic accuracy and treatment planning.

Objective: This research evaluates Generative Pre-trained Transformer-4 Vision's (GPT-4V) performance in accurately diagnosing and generating treatment plans for common dermatological conditions, comparing its assessment of textual versus image data and its performance with multimodal inputs.

Methods: A dataset of 102 images representing 9 common dermatological conditions was compiled from dermatlas.org and dermnet.nz. Images were screened by 2 board-certified dermatologists and were excluded if they did not represent a classic presentation of the respective conditions. Fifty-four images were included in the final analysis. In addition, 9 text-based clinical scenarios corresponding to each condition were developed. GPT-4V's diagnostic capabilities were assessed across 3 setups: Image Prompt, Scenario Prompt, and Image + Scenario Prompt.

Results: In the Image Prompt setup, GPT-4V correctly identified the primary diagnosis for 54% of the images. The Scenario Prompt and the Image + Scenario Prompt setups, respectively, both achieved an 89% accuracy rate in identifying the primary diagnosis. Treatment recommendations were evaluated using a modified Entrustment Scale, showing competent but not expert-level performance. A Wilcoxon signed-rank test demonstrated a statistically significant difference in treatment recommendations based on the Entrustment Score, with the model performing better in the Image + Scenario setup (P < .01).

Conclusion: GPT-4V demonstrates the potential to augment dermatological diagnosis and treatment recommendations, particularly in text-based scenarios. However, its underwhelming performance in image-based diagnosis and integration of multimodal data highlights important areas for improvement.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.70
自引率
4.30%
发文量
98
审稿时长
6-12 weeks
期刊介绍: Journal of Cutaneous Medicine and Surgery (JCMS) aims to reflect the state of the art in cutaneous biology and dermatology by providing original scientific writings, as well as a complete critical review of the dermatology literature for clinicians, trainees, and academicians. JCMS endeavours to bring readers cutting edge dermatologic information in two distinct formats. Part of each issue features scholarly research and articles on issues of basic and applied science, insightful case reports, comprehensive continuing medical education, and in depth reviews, all of which provide theoretical framework for practitioners to make sound practical decisions. The evolving field of dermatology is highlighted through these articles. In addition, part of each issue is dedicated to making the most important developments in dermatology easily accessible to the clinician by presenting well-chosen, well-written, and highly organized information in a format that is interesting, clearly presented, and useful to patient care.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信