Comparative Evaluation of ChatGPT and ChatGLM Performance in Response to Common Queries on Pediatric Atopic Dermatitis.

IF 1.2 4区 医学 Q3 DERMATOLOGY
Zhipeng Lin, Songyi Piao, Aoxue Wang
{"title":"Comparative Evaluation of ChatGPT and ChatGLM Performance in Response to Common Queries on Pediatric Atopic Dermatitis.","authors":"Zhipeng Lin, Songyi Piao, Aoxue Wang","doi":"10.1111/pde.15988","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Atopic dermatitis (AD) is a prevalent chronic and recurrent skin condition in children. Developing novel and standardized management strategies to control AD is urgently needed. Artificial intelligence technology-based large language models (LLMs), especially Chat Generative Pre-trained Transformer (ChatGPT) and Chat General Language Modeling (ChatGLM), show potential in generating appropriate responses to dialogue.</p><p><strong>Methods: </strong>This study aims to assess the performance of ChatGPT-4 omni (ChatGPT-4o) and ChatGLM-4 in answering common queries about pediatric AD in a medical context. By screening popular inquiries from the AtopicDermatitis.net forum, we identified 102 key questions from parents of children with AD. Then, each question was input into both ChatGPT-4o and ChatGLM-4 to generate responses. Five senior dermatologists independently scored the reliability and clinical applicability of the responses. Finally, we compared the score distributions and performed a consistency analysis.</p><p><strong>Results: </strong>For both reliability and clinical applicability, ChatGPT-4o scored slightly better overall, ranging from 92.98% to 95.97% of the total maximum score, compared to ChatGLM-4, which ranged from 82.59% to 96.83%. However, there was no significant difference between them (p > 0.05). The consistency test indicated significant concordance among dermatologists (p < 0.05), with Kendall's coefficient of concordance above 0.40 in subgroups such as skin care, special manifestations, and treatment, demonstrating moderate consistency. They provide equivalent reliability and clinical applicability in answering queries about pediatric AD.</p><p><strong>Conclusions: </strong>The quality of the two LLMs' responses matches that of dermatology professors, which demonstrates that LLMs can effectively recommend treatments, care, and management strategies for pediatric AD.</p>","PeriodicalId":19819,"journal":{"name":"Pediatric Dermatology","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Dermatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/pde.15988","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DERMATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Atopic dermatitis (AD) is a prevalent chronic and recurrent skin condition in children. Developing novel and standardized management strategies to control AD is urgently needed. Artificial intelligence technology-based large language models (LLMs), especially Chat Generative Pre-trained Transformer (ChatGPT) and Chat General Language Modeling (ChatGLM), show potential in generating appropriate responses to dialogue.

Methods: This study aims to assess the performance of ChatGPT-4 omni (ChatGPT-4o) and ChatGLM-4 in answering common queries about pediatric AD in a medical context. By screening popular inquiries from the AtopicDermatitis.net forum, we identified 102 key questions from parents of children with AD. Then, each question was input into both ChatGPT-4o and ChatGLM-4 to generate responses. Five senior dermatologists independently scored the reliability and clinical applicability of the responses. Finally, we compared the score distributions and performed a consistency analysis.

Results: For both reliability and clinical applicability, ChatGPT-4o scored slightly better overall, ranging from 92.98% to 95.97% of the total maximum score, compared to ChatGLM-4, which ranged from 82.59% to 96.83%. However, there was no significant difference between them (p > 0.05). The consistency test indicated significant concordance among dermatologists (p < 0.05), with Kendall's coefficient of concordance above 0.40 in subgroups such as skin care, special manifestations, and treatment, demonstrating moderate consistency. They provide equivalent reliability and clinical applicability in answering queries about pediatric AD.

Conclusions: The quality of the two LLMs' responses matches that of dermatology professors, which demonstrates that LLMs can effectively recommend treatments, care, and management strategies for pediatric AD.

ChatGPT与ChatGLM对儿童特应性皮炎常见问题的比较评价
背景:特应性皮炎(AD)是儿童中一种常见的慢性反复皮肤疾病。迫切需要制定新的、标准化的管理策略来控制AD。基于人工智能技术的大型语言模型(llm),特别是聊天生成预训练转换器(ChatGPT)和聊天通用语言建模(ChatGLM),显示出对对话生成适当响应的潜力。方法:本研究旨在评估ChatGPT-4 omni (chatgpt - 40)和ChatGLM-4在回答医学背景下关于儿童AD的常见问题方面的表现。通过筛选AtopicDermatitis.net论坛上的热门问题,我们从AD患儿的父母那里确定了102个关键问题。然后,将每个问题输入chatgpt - 40和ChatGLM-4以生成答案。五名资深皮肤科医生独立对问卷的可靠性和临床适用性进行评分。最后,我们比较了分数分布并进行了一致性分析。结果:在信度和临床适用性方面,chatgpt - 40的总体评分略好于ChatGLM-4,其评分范围为最高总分的92.98% ~ 95.97%,而ChatGLM-4评分范围为82.59% ~ 96.83%。两组间差异无统计学意义(p < 0.05)。结论:两位法学硕士的回答质量与皮肤科教授的一致,这表明法学硕士可以有效地推荐儿科AD的治疗、护理和管理策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pediatric Dermatology
Pediatric Dermatology 医学-皮肤病学
CiteScore
3.20
自引率
6.70%
发文量
269
审稿时长
1 months
期刊介绍: Pediatric Dermatology answers the need for new ideas and strategies for today''s pediatrician or dermatologist. As a teaching vehicle, the Journal is still unsurpassed and it will continue to present the latest on topics such as hemangiomas, atopic dermatitis, rare and unusual presentations of childhood diseases, neonatal medicine, and therapeutic advances. As important progress is made in any area involving infants and children, Pediatric Dermatology is there to publish the findings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信