Application of ChatGPT as a content generation tool in continuing medical education: acne as a test topic.

IF 1.3 Q2 DERMATOLOGY

Dermatology Reports Pub Date : 2025-05-23 Epub Date: 2024-11-28 DOI:10.4081/dr.2024.10138

Luigi Naldi, Vincenzo Bettoli, Eugenio Santoro, Maria Rosa Valetto, Anna Bolzon, Fortunato Cassalia, Simone Cazzaniga, Sergio Cima, Andrea Danese, Silvia Emendi, Monica Ponzano, Nicoletta Scarpa, Pietro Dri

{"title":"Application of ChatGPT as a content generation tool in continuing medical education: acne as a test topic.","authors":"Luigi Naldi, Vincenzo Bettoli, Eugenio Santoro, Maria Rosa Valetto, Anna Bolzon, Fortunato Cassalia, Simone Cazzaniga, Sergio Cima, Andrea Danese, Silvia Emendi, Monica Ponzano, Nicoletta Scarpa, Pietro Dri","doi":"10.4081/dr.2024.10138","DOIUrl":null,"url":null,"abstract":"<p><p>The large language model (LLM) ChatGPT can answer open-ended and complex questions, but its accuracy in providing reliable medical information requires a careful assessment. As part of the AI-CHECK (Artificial Intelligence for CME Health E-learning Contents and Knowledge) study, aimed at evaluating the potential of ChatGPT in continuous medical education (CME), we compared ChatGPT-generated educational content to the recommendations of the National Institute for Health and Care Excellence (NICE) guidelines on acne vulgaris. ChatGPT version 4 was exposed to a 23-item questionnaire developed by an experienced dermatologist. A panel of five dermatologists rated the answers positively in terms of \"quality\" (87.8%), \"readability\" (94.8%), \"accuracy\" (75.7%), \"thoroughness\" (85.2%), and \"consistency\" with guidelines (76.8%). The references provided by ChatGPT obtained positive ratings for \"pertinence\" (94.6%), \"relevance\" (91.2%), and \"update\" (62.3%). The internal reproducibility was adequate both for answers (93.5%) and references (67.4%). Answers related to issues of uncertainty and/or controversy in the scientific community scored the lowest. This study underscores the need to develop rigorous evaluation criteria for AI-generated medical content and for expert oversight to ensure accuracy and guideline adherence.</p>","PeriodicalId":11049,"journal":{"name":"Dermatology Reports","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12210357/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dermatology Reports","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4081/dr.2024.10138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/28 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"DERMATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The large language model (LLM) ChatGPT can answer open-ended and complex questions, but its accuracy in providing reliable medical information requires a careful assessment. As part of the AI-CHECK (Artificial Intelligence for CME Health E-learning Contents and Knowledge) study, aimed at evaluating the potential of ChatGPT in continuous medical education (CME), we compared ChatGPT-generated educational content to the recommendations of the National Institute for Health and Care Excellence (NICE) guidelines on acne vulgaris. ChatGPT version 4 was exposed to a 23-item questionnaire developed by an experienced dermatologist. A panel of five dermatologists rated the answers positively in terms of "quality" (87.8%), "readability" (94.8%), "accuracy" (75.7%), "thoroughness" (85.2%), and "consistency" with guidelines (76.8%). The references provided by ChatGPT obtained positive ratings for "pertinence" (94.6%), "relevance" (91.2%), and "update" (62.3%). The internal reproducibility was adequate both for answers (93.5%) and references (67.4%). Answers related to issues of uncertainty and/or controversy in the scientific community scored the lowest. This study underscores the need to develop rigorous evaluation criteria for AI-generated medical content and for expert oversight to ensure accuracy and guideline adherence.

Abstract Image

查看原文本刊更多论文

ChatGPT作为内容生成工具在继续医学教育中的应用：痤疮作为测试主题。

大型语言模型（LLM） ChatGPT可以回答开放式和复杂的问题，但其提供可靠医疗信息的准确性需要仔细评估。作为AICHECK（人工智能用于继续医学教育健康电子学习内容和知识）研究的一部分，旨在评估ChatGPT在继续医学教育（CME）中的潜力，我们将ChatGPT生成的教育内容与美国国家健康与护理卓越研究所（NICE）关于寻常痤疮指南的建议进行了比较。ChatGPT版本4暴露于由经验丰富的皮肤科医生开发的23项问卷。一个由五位皮肤科医生组成的小组在“质量”（87.8%）、“可读性”（94.8%）、“准确性”（75.7%）、“彻彻性”（85.2%）和“与指南的一致性”（76.8%）方面对答案给出了积极的评价。ChatGPT提供的参考文献在“针对性”（94.6%）、“相关性”（91.2%）和“更新”（62.3%）方面获得了积极的评价。内部重现性对答案（93.5%）和参考文献（67.4%）都是足够的。与科学界的不确定性和/或争议问题相关的答案得分最低。这项研究强调有必要为人工智能生成的医疗内容和专家监督制定严格的评估标准，以确保准确性和指南的遵守。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊