{"title":"Comparative Evaluation of ChatGPT and ChatGLM Performance in Response to Common Queries on Pediatric Atopic Dermatitis.","authors":"Zhipeng Lin, Songyi Piao, Aoxue Wang","doi":"10.1111/pde.15988","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Atopic dermatitis (AD) is a prevalent chronic and recurrent skin condition in children. Developing novel and standardized management strategies to control AD is urgently needed. Artificial intelligence technology-based large language models (LLMs), especially Chat Generative Pre-trained Transformer (ChatGPT) and Chat General Language Modeling (ChatGLM), show potential in generating appropriate responses to dialogue.</p><p><strong>Methods: </strong>This study aims to assess the performance of ChatGPT-4 omni (ChatGPT-4o) and ChatGLM-4 in answering common queries about pediatric AD in a medical context. By screening popular inquiries from the AtopicDermatitis.net forum, we identified 102 key questions from parents of children with AD. Then, each question was input into both ChatGPT-4o and ChatGLM-4 to generate responses. Five senior dermatologists independently scored the reliability and clinical applicability of the responses. Finally, we compared the score distributions and performed a consistency analysis.</p><p><strong>Results: </strong>For both reliability and clinical applicability, ChatGPT-4o scored slightly better overall, ranging from 92.98% to 95.97% of the total maximum score, compared to ChatGLM-4, which ranged from 82.59% to 96.83%. However, there was no significant difference between them (p > 0.05). The consistency test indicated significant concordance among dermatologists (p < 0.05), with Kendall's coefficient of concordance above 0.40 in subgroups such as skin care, special manifestations, and treatment, demonstrating moderate consistency. They provide equivalent reliability and clinical applicability in answering queries about pediatric AD.</p><p><strong>Conclusions: </strong>The quality of the two LLMs' responses matches that of dermatology professors, which demonstrates that LLMs can effectively recommend treatments, care, and management strategies for pediatric AD.</p>","PeriodicalId":19819,"journal":{"name":"Pediatric Dermatology","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Dermatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/pde.15988","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DERMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Atopic dermatitis (AD) is a prevalent chronic and recurrent skin condition in children. Developing novel and standardized management strategies to control AD is urgently needed. Artificial intelligence technology-based large language models (LLMs), especially Chat Generative Pre-trained Transformer (ChatGPT) and Chat General Language Modeling (ChatGLM), show potential in generating appropriate responses to dialogue.
Methods: This study aims to assess the performance of ChatGPT-4 omni (ChatGPT-4o) and ChatGLM-4 in answering common queries about pediatric AD in a medical context. By screening popular inquiries from the AtopicDermatitis.net forum, we identified 102 key questions from parents of children with AD. Then, each question was input into both ChatGPT-4o and ChatGLM-4 to generate responses. Five senior dermatologists independently scored the reliability and clinical applicability of the responses. Finally, we compared the score distributions and performed a consistency analysis.
Results: For both reliability and clinical applicability, ChatGPT-4o scored slightly better overall, ranging from 92.98% to 95.97% of the total maximum score, compared to ChatGLM-4, which ranged from 82.59% to 96.83%. However, there was no significant difference between them (p > 0.05). The consistency test indicated significant concordance among dermatologists (p < 0.05), with Kendall's coefficient of concordance above 0.40 in subgroups such as skin care, special manifestations, and treatment, demonstrating moderate consistency. They provide equivalent reliability and clinical applicability in answering queries about pediatric AD.
Conclusions: The quality of the two LLMs' responses matches that of dermatology professors, which demonstrates that LLMs can effectively recommend treatments, care, and management strategies for pediatric AD.
期刊介绍:
Pediatric Dermatology answers the need for new ideas and strategies for today''s pediatrician or dermatologist. As a teaching vehicle, the Journal is still unsurpassed and it will continue to present the latest on topics such as hemangiomas, atopic dermatitis, rare and unusual presentations of childhood diseases, neonatal medicine, and therapeutic advances. As important progress is made in any area involving infants and children, Pediatric Dermatology is there to publish the findings.