人工智能在皮肤科考试中的表现：ChatGPT的考试成功与限制

IF 2.3 4区医学 Q2 DERMATOLOGY

Journal of Cosmetic Dermatology Pub Date : 2025-05-19 DOI:10.1111/jocd.70244

Neşe Göçer Gürok, Savaş Öztürk

{"title":"人工智能在皮肤科考试中的表现：ChatGPT的考试成功与限制","authors":"Neşe Göçer Gürok, Savaş Öztürk","doi":"10.1111/jocd.70244","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Artificial intelligence holds significant potential in dermatology.</p>\n </section>\n \n <section>\n \n <h3> Objectives</h3>\n \n <p>This study aimed to explore the potential and limitations of artificial intelligence applications in dermatology education by evaluating ChatGPT's performance on questions from the dermatology residency exam.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>In this study, the dermatology residency exam results for ChatGPT versions 3.5 and 4.0 were compared with those of resident doctors across various seniority levels. Dermatology resident doctors were categorized into four seniority levels based on their education, and a total of 100 questions—25 multiple-choice questions for each seniority level—were included in the exam. The same questions were also administered to ChatGPT versions 3.5 and 4.0, and the scores were analyzed statistically.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>ChatGPT 3.5 performed poorly, especially when compared to senior residents. Second (<i>p</i> = 0.038), third (<i>p</i> = 0.041), and fourth-year senior resident physicians (<i>p</i> = 0.020) scored significantly higher than ChatGPT 3.5. ChatGPT 4.0 showed similar performance compared to first- and third-year senior resident physicians, but performed worse in comparison to second (<i>p</i> = 0.037) and fourth-year senior resident physicians (<i>p</i> = 0.029). Both versions scored lower as seniority and exam difficulty increased. ChatGPT 3.5 passed the first and second-year exams but failed the third and fourth-year exams. ChatGPT 4.0 passed the first, second, and third-year exams but failed the fourth-year exam. These findings suggest that ChatGPT was not on par with senior resident physicians, particularly on topics requiring advanced knowledge; however, version 4.0 proved to be more effective than version 3.5.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>In the future, as ChatGPT's language support and knowledge of medicine improve, it can be used more effectively in educational processes.</p>\n </section>\n </div>","PeriodicalId":15546,"journal":{"name":"Journal of Cosmetic Dermatology","volume":"24 5","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jocd.70244","citationCount":"0","resultStr":"{\"title\":\"The Performance of AI in Dermatology Exams: The Exam Success and Limits of ChatGPT\",\"authors\":\"Neşe Göçer Gürok, Savaş Öztürk\",\"doi\":\"10.1111/jocd.70244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Artificial intelligence holds significant potential in dermatology.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Objectives</h3>\\n \\n <p>This study aimed to explore the potential and limitations of artificial intelligence applications in dermatology education by evaluating ChatGPT's performance on questions from the dermatology residency exam.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Method</h3>\\n \\n <p>In this study, the dermatology residency exam results for ChatGPT versions 3.5 and 4.0 were compared with those of resident doctors across various seniority levels. Dermatology resident doctors were categorized into four seniority levels based on their education, and a total of 100 questions—25 multiple-choice questions for each seniority level—were included in the exam. The same questions were also administered to ChatGPT versions 3.5 and 4.0, and the scores were analyzed statistically.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>ChatGPT 3.5 performed poorly, especially when compared to senior residents. Second (<i>p</i> = 0.038), third (<i>p</i> = 0.041), and fourth-year senior resident physicians (<i>p</i> = 0.020) scored significantly higher than ChatGPT 3.5. ChatGPT 4.0 showed similar performance compared to first- and third-year senior resident physicians, but performed worse in comparison to second (<i>p</i> = 0.037) and fourth-year senior resident physicians (<i>p</i> = 0.029). Both versions scored lower as seniority and exam difficulty increased. ChatGPT 3.5 passed the first and second-year exams but failed the third and fourth-year exams. ChatGPT 4.0 passed the first, second, and third-year exams but failed the fourth-year exam. These findings suggest that ChatGPT was not on par with senior resident physicians, particularly on topics requiring advanced knowledge; however, version 4.0 proved to be more effective than version 3.5.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>In the future, as ChatGPT's language support and knowledge of medicine improve, it can be used more effectively in educational processes.</p>\\n </section>\\n </div>\",\"PeriodicalId\":15546,\"journal\":{\"name\":\"Journal of Cosmetic Dermatology\",\"volume\":\"24 5\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jocd.70244\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cosmetic Dermatology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/jocd.70244\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DERMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cosmetic Dermatology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jocd.70244","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DERMATOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

人工智能在皮肤病学领域具有巨大的潜力。本研究旨在通过评估ChatGPT在皮肤科住院医师考试中的表现，探讨人工智能在皮肤科教育中的应用潜力和局限性。方法本研究将ChatGPT 3.5和4.0版本的皮肤科住院医师考试结果与不同资历级别的住院医师进行比较。皮肤科住院医师根据学历分为4个资历等级，每个资历等级25道选择题共100道。在ChatGPT 3.5和4.0版本中也使用了相同的问题，并对分数进行统计分析。结果ChatGPT 3.5表现不佳，特别是与老年居民相比。二年级（p = 0.038）、三年级（p = 0.041）和四年级高级住院医师（p = 0.020）得分显著高于ChatGPT 3.5。ChatGPT 4.0与第一年和第三年的高级住院医师表现相似，但与第二年（p = 0.037）和第四年的高级住院医师（p = 0.029）相比表现更差。随着资历和考试难度的增加，这两个版本的得分都较低。ChatGPT 3.5通过了第一年和第二年的考试，但没有通过第三年和第四年的考试。ChatGPT 4.0通过了第一年、第二年和第三年的考试，但没有通过第四年的考试。这些发现表明，ChatGPT不能与高级住院医师相提并论，特别是在需要高级知识的主题上；然而，4.0版本被证明比3.5版本更有效。在未来，随着ChatGPT的语言支持和医学知识的提高，它可以更有效地用于教育过程中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Performance of AI in Dermatology Exams: The Exam Success and Limits of ChatGPT

Background

Artificial intelligence holds significant potential in dermatology.

Objectives

This study aimed to explore the potential and limitations of artificial intelligence applications in dermatology education by evaluating ChatGPT's performance on questions from the dermatology residency exam.

Method

In this study, the dermatology residency exam results for ChatGPT versions 3.5 and 4.0 were compared with those of resident doctors across various seniority levels. Dermatology resident doctors were categorized into four seniority levels based on their education, and a total of 100 questions—25 multiple-choice questions for each seniority level—were included in the exam. The same questions were also administered to ChatGPT versions 3.5 and 4.0, and the scores were analyzed statistically.

Results

ChatGPT 3.5 performed poorly, especially when compared to senior residents. Second (p = 0.038), third (p = 0.041), and fourth-year senior resident physicians (p = 0.020) scored significantly higher than ChatGPT 3.5. ChatGPT 4.0 showed similar performance compared to first- and third-year senior resident physicians, but performed worse in comparison to second (p = 0.037) and fourth-year senior resident physicians (p = 0.029). Both versions scored lower as seniority and exam difficulty increased. ChatGPT 3.5 passed the first and second-year exams but failed the third and fourth-year exams. ChatGPT 4.0 passed the first, second, and third-year exams but failed the fourth-year exam. These findings suggest that ChatGPT was not on par with senior resident physicians, particularly on topics requiring advanced knowledge; however, version 4.0 proved to be more effective than version 3.5.

Conclusion

In the future, as ChatGPT's language support and knowledge of medicine improve, it can be used more effectively in educational processes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Cosmetic Dermatology DERMATOLOGY-

CiteScore

4.30

自引率

13.00%

发文量

818

审稿时长

>12 weeks

期刊介绍： The Journal of Cosmetic Dermatology publishes high quality, peer-reviewed articles on all aspects of cosmetic dermatology with the aim to foster the highest standards of patient care in cosmetic dermatology. Published quarterly, the Journal of Cosmetic Dermatology facilitates continuing professional development and provides a forum for the exchange of scientific research and innovative techniques. The scope of coverage includes, but will not be limited to: healthy skin; skin maintenance; ageing skin; photodamage and photoprotection; rejuvenation; biochemistry, endocrinology and neuroimmunology of healthy skin; imaging; skin measurement; quality of life; skin types; sensitive skin; rosacea and acne; sebum; sweat; fat; phlebology; hair conservation, restoration and removal; nails and nail surgery; pigment; psychological and medicolegal issues; retinoids; cosmetic chemistry; dermopharmacy; cosmeceuticals; toiletries; striae; cellulite; cosmetic dermatological surgery; blepharoplasty; liposuction; surgical complications; botulinum; fillers, peels and dermabrasion; local and tumescent anaesthesia; electrosurgery; lasers, including laser physics, laser research and safety, vascular lasers, pigment lasers, hair removal lasers, tattoo removal lasers, resurfacing lasers, dermal remodelling lasers and laser complications.