Taylor Kring, Srihitha Akula, Soumil Prasad, Eric Sokhn, Seth R Thaller
{"title":"Evaluating AI Chatbots for Preoperative and Postoperative Counseling for Mandibular Distraction Osteogenesis.","authors":"Taylor Kring, Srihitha Akula, Soumil Prasad, Eric Sokhn, Seth R Thaller","doi":"10.1097/SCS.0000000000011543","DOIUrl":null,"url":null,"abstract":"<p><p>Mandibular distraction osteogenesis (MDO) is a craniofacial procedure frequently performed in pediatric patients with micrognathia airway obstruction. Preoperative and postoperative counseling for families undergoing this procedure is essential involving a multistage surgical course, device management, feeding changes, and airway considerations. This study evaluates the trustworthiness and readability of AI (artificial intelligence) chatbot responses to questions about operative care for MDO. Study was conducted using ChatGPT, Google Gemini, Microsoft Copilot, and Open Evidence. Twenty common preoperative and postoperative care questions relating to MDO were developed. The authors used a modified DISCERN tool to assess quality and the SMOG (Simple Measure of Gobbledygook) test to evaluate response readability. Data underwent statistical analysis using descriptive statistics, 1-way ANOVA, and Tukey HSD. Modified DISCERN analysis revealed clear aims and relevancy scored the highest (mean=4.92 SD=0.31; mean=4.64, SD=0.62). Additional sources provided and citation of sources had the lowest means (mean=2.19 SD=1.52; mean=2.93 SD=1.96). Microsoft Copilot scored the highest in overall quality (mean=38.10 versus ChatGPT=29.90, P<0.001). Open Evidence scored lowest in shared decision-making (mean=1.80 SD=1.10). Effect sizes were large for source-related variables, with eta-squared values >0.75. Significant differences in readability across all AI models were found (mean=17.31 SD=3.59, P<0.001), indicating the average response was at a graduate school reading level. Open Evidence (mean=22.24) produced higher SMOG reading scores than ChatGPT (mean=15.89), Google Gemini (mean=15.66), and Microsoft Copilot (mean=15.44) (P<0.001). These findings highlight a need for reviewing the reliability of AI chatbots in preoperative and postoperative counseling for MDO.</p>","PeriodicalId":15462,"journal":{"name":"Journal of Craniofacial Surgery","volume":" ","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Craniofacial Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/SCS.0000000000011543","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Mandibular distraction osteogenesis (MDO) is a craniofacial procedure frequently performed in pediatric patients with micrognathia airway obstruction. Preoperative and postoperative counseling for families undergoing this procedure is essential involving a multistage surgical course, device management, feeding changes, and airway considerations. This study evaluates the trustworthiness and readability of AI (artificial intelligence) chatbot responses to questions about operative care for MDO. Study was conducted using ChatGPT, Google Gemini, Microsoft Copilot, and Open Evidence. Twenty common preoperative and postoperative care questions relating to MDO were developed. The authors used a modified DISCERN tool to assess quality and the SMOG (Simple Measure of Gobbledygook) test to evaluate response readability. Data underwent statistical analysis using descriptive statistics, 1-way ANOVA, and Tukey HSD. Modified DISCERN analysis revealed clear aims and relevancy scored the highest (mean=4.92 SD=0.31; mean=4.64, SD=0.62). Additional sources provided and citation of sources had the lowest means (mean=2.19 SD=1.52; mean=2.93 SD=1.96). Microsoft Copilot scored the highest in overall quality (mean=38.10 versus ChatGPT=29.90, P<0.001). Open Evidence scored lowest in shared decision-making (mean=1.80 SD=1.10). Effect sizes were large for source-related variables, with eta-squared values >0.75. Significant differences in readability across all AI models were found (mean=17.31 SD=3.59, P<0.001), indicating the average response was at a graduate school reading level. Open Evidence (mean=22.24) produced higher SMOG reading scores than ChatGPT (mean=15.89), Google Gemini (mean=15.66), and Microsoft Copilot (mean=15.44) (P<0.001). These findings highlight a need for reviewing the reliability of AI chatbots in preoperative and postoperative counseling for MDO.
期刊介绍:
The Journal of Craniofacial Surgery serves as a forum of communication for all those involved in craniofacial surgery, maxillofacial surgery and pediatric plastic surgery. Coverage ranges from practical aspects of craniofacial surgery to the basic science that underlies surgical practice. The journal publishes original articles, scientific reviews, editorials and invited commentary, abstracts and selected articles from international journals, and occasional international bibliographies in craniofacial surgery.