Esra Ekizer, Kevser Kurt Demirsoy, Süleyman Kutalmış Büyük, Semih Canpolat, Ahmet Bilirer
{"title":"chatgpt - 40和Grok-3在唇腭裂和婴幼儿整形外科手术中的比较评价:一项由正畸医生、儿科医生和整形外科医生进行的多学科评估。","authors":"Esra Ekizer, Kevser Kurt Demirsoy, Süleyman Kutalmış Büyük, Semih Canpolat, Ahmet Bilirer","doi":"10.1177/10556656251378591","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> This study aimed to evaluate and compare the accuracy, clarity, and clinical applicability of 2 state-of-the-art large language models (LLMs), Chat Generative Pretrained Transformer (ChatGPT)-4o and Grok-3, in generating health information related to cleft lip and palate (CLP) and presurgical infant orthopedics (PSIO). To ensure a multidisciplinary perspective, experts from orthodontics, pediatrics, and plastic surgery independently evaluated the responses. <b>Methods:</b> Six structured questions addressing general and presurgical aspects of CLP were submitted to both ChatGPT-4o and Grok-3. Forty-five blinded specialists (15 from each specialty) assessed the 12 generated responses using 2 validated instruments: the DISCERN tool and the Global Quality Scale (GQS). We conducted interspecialty comparisons to explore variations in model evaluation. <b>Results:</b> We observed no statistically significant differences between ChatGPT-4o and Grok-3 in DISCERN or GQS scores (<i>P</i> > .05). However, pediatricians consistently assigned higher ratings than orthodontists and plastic surgeons in terms of reliability, clarity, and treatment-related content. Patient-directed questions received higher overall scores than those aimed at healthcare professionals. Grok-3 performed slightly better on questions about PSIO, whereas ChatGPT-4o provided more comprehensive and structured answers. <b>Conclusion:</b> Both LLMs demonstrated notable potential in producing readable, informative responses about CLP and PSIO. While they may aid in patient communication and support clinical education, professional oversight remains critical to ensure medical accuracy. The inclusion of Grok-3 in this orthodontic evaluation provides valuable insights and sets the stage for future research on artificial intelligence integration in interdisciplinary cleft care.</p>","PeriodicalId":49220,"journal":{"name":"Cleft Palate-Craniofacial Journal","volume":" ","pages":"10556656251378591"},"PeriodicalIF":1.3000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Evaluation of ChatGPT-4o and Grok-3 on Cleft Lip and Palate and Presurgical Infant Orthopedics: A Multidisciplinary Assessment by Orthodontists, Pediatricians, and Plastic Surgeons.\",\"authors\":\"Esra Ekizer, Kevser Kurt Demirsoy, Süleyman Kutalmış Büyük, Semih Canpolat, Ahmet Bilirer\",\"doi\":\"10.1177/10556656251378591\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Objective:</b> This study aimed to evaluate and compare the accuracy, clarity, and clinical applicability of 2 state-of-the-art large language models (LLMs), Chat Generative Pretrained Transformer (ChatGPT)-4o and Grok-3, in generating health information related to cleft lip and palate (CLP) and presurgical infant orthopedics (PSIO). To ensure a multidisciplinary perspective, experts from orthodontics, pediatrics, and plastic surgery independently evaluated the responses. <b>Methods:</b> Six structured questions addressing general and presurgical aspects of CLP were submitted to both ChatGPT-4o and Grok-3. Forty-five blinded specialists (15 from each specialty) assessed the 12 generated responses using 2 validated instruments: the DISCERN tool and the Global Quality Scale (GQS). We conducted interspecialty comparisons to explore variations in model evaluation. <b>Results:</b> We observed no statistically significant differences between ChatGPT-4o and Grok-3 in DISCERN or GQS scores (<i>P</i> > .05). However, pediatricians consistently assigned higher ratings than orthodontists and plastic surgeons in terms of reliability, clarity, and treatment-related content. Patient-directed questions received higher overall scores than those aimed at healthcare professionals. Grok-3 performed slightly better on questions about PSIO, whereas ChatGPT-4o provided more comprehensive and structured answers. <b>Conclusion:</b> Both LLMs demonstrated notable potential in producing readable, informative responses about CLP and PSIO. While they may aid in patient communication and support clinical education, professional oversight remains critical to ensure medical accuracy. The inclusion of Grok-3 in this orthodontic evaluation provides valuable insights and sets the stage for future research on artificial intelligence integration in interdisciplinary cleft care.</p>\",\"PeriodicalId\":49220,\"journal\":{\"name\":\"Cleft Palate-Craniofacial Journal\",\"volume\":\" \",\"pages\":\"10556656251378591\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cleft Palate-Craniofacial Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/10556656251378591\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Dentistry\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cleft Palate-Craniofacial Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/10556656251378591","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Dentistry","Score":null,"Total":0}
Comparative Evaluation of ChatGPT-4o and Grok-3 on Cleft Lip and Palate and Presurgical Infant Orthopedics: A Multidisciplinary Assessment by Orthodontists, Pediatricians, and Plastic Surgeons.
Objective: This study aimed to evaluate and compare the accuracy, clarity, and clinical applicability of 2 state-of-the-art large language models (LLMs), Chat Generative Pretrained Transformer (ChatGPT)-4o and Grok-3, in generating health information related to cleft lip and palate (CLP) and presurgical infant orthopedics (PSIO). To ensure a multidisciplinary perspective, experts from orthodontics, pediatrics, and plastic surgery independently evaluated the responses. Methods: Six structured questions addressing general and presurgical aspects of CLP were submitted to both ChatGPT-4o and Grok-3. Forty-five blinded specialists (15 from each specialty) assessed the 12 generated responses using 2 validated instruments: the DISCERN tool and the Global Quality Scale (GQS). We conducted interspecialty comparisons to explore variations in model evaluation. Results: We observed no statistically significant differences between ChatGPT-4o and Grok-3 in DISCERN or GQS scores (P > .05). However, pediatricians consistently assigned higher ratings than orthodontists and plastic surgeons in terms of reliability, clarity, and treatment-related content. Patient-directed questions received higher overall scores than those aimed at healthcare professionals. Grok-3 performed slightly better on questions about PSIO, whereas ChatGPT-4o provided more comprehensive and structured answers. Conclusion: Both LLMs demonstrated notable potential in producing readable, informative responses about CLP and PSIO. While they may aid in patient communication and support clinical education, professional oversight remains critical to ensure medical accuracy. The inclusion of Grok-3 in this orthodontic evaluation provides valuable insights and sets the stage for future research on artificial intelligence integration in interdisciplinary cleft care.
期刊介绍:
The Cleft Palate-Craniofacial Journal (CPCJ) is the premiere peer-reviewed, interdisciplinary, international journal dedicated to current research on etiology, prevention, diagnosis, and treatment in all areas pertaining to craniofacial anomalies. CPCJ reports on basic science and clinical research aimed at better elucidating the pathogenesis, pathology, and optimal methods of treatment of cleft and craniofacial anomalies. The journal strives to foster communication and cooperation among professionals from all specialties.