Evaluating the Reliability and Readability of AI Chatbot Responses for Microtia Patient Education.

IF 1 4区医学 Q3 SURGERY

Journal of Craniofacial Surgery Pub Date : 2025-10-02 DOI:10.1097/SCS.0000000000011988

Supriya Dadi, Taylor Kring, Kyle Latz, David Cohen, Seth Thaller

{"title":"Evaluating the Reliability and Readability of AI Chatbot Responses for Microtia Patient Education.","authors":"Supriya Dadi, Taylor Kring, Kyle Latz, David Cohen, Seth Thaller","doi":"10.1097/SCS.0000000000011988","DOIUrl":null,"url":null,"abstract":"Introduction: Ear microtia is a congenital deformity that can range from mild underdevelopment to complete absence of the external ear. Often unilateral, it causes visible facial asymmetry leading to psychosocial distress for patients and families. Caregivers report feeling guilty and anxious, while patients experience increased rates of depression and social challenges. This is often a difficult time for the patient and their families, who often turn to AI chatbots for guidance before and after receiving definitive surgical care. This study evaluates the quality and readability of leading AI-based chatbots when responding to patient-centered questions about the condition.Methods: Popular AI chatbots (ChatGPT 4o, Google Gemini, DeepSeek, and OpenEvidence) were asked 25 queries about microtia developed from the FAQ section on hospital websites. Responses were evaluated using modified DISCERN criteria for quality and SMOG scoring for readability. ANOVA and post hoc analyses were performed to identify significant differences.Results: Google Gemini achieved the highest DISCERN score (M=37.16, SD=2.58), followed by OpenEvidence (M=32.19, SD=3.54). DeepSeek (M=30.76, SD=4.29) and ChatGPT (M=30.32, SD=2.97) had the lowest DISCERN scores. OpenEvidence had the worst readability (M=18.06, SD=1.12), followed by ChatGPT (M=16.32, SD=1.41). DeepSeek was the most readable (M=14.63, SD=1.60), closely followed by Google Gemini (M=14.73, SD=1.27). Overall, the average DISCERN and SMOG scores across all platforms were 32.19 (SD=4.43) and 15.93 (SD=1.94), respectively, indicating a good quality and an undergraduate reading level.Conclusions: None of the platforms consistently met both quality and readability standards, though Google Gemini performed relatively well. As reliance on AI for early health information grows, ensuring the accessibility of chatbot responses will be crucial for supporting informed decision-making and enhancing the patient experience.","PeriodicalId":15462,"journal":{"name":"Journal of Craniofacial Surgery","volume":" ","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Craniofacial Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/SCS.0000000000011988","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Ear microtia is a congenital deformity that can range from mild underdevelopment to complete absence of the external ear. Often unilateral, it causes visible facial asymmetry leading to psychosocial distress for patients and families. Caregivers report feeling guilty and anxious, while patients experience increased rates of depression and social challenges. This is often a difficult time for the patient and their families, who often turn to AI chatbots for guidance before and after receiving definitive surgical care. This study evaluates the quality and readability of leading AI-based chatbots when responding to patient-centered questions about the condition.

Methods: Popular AI chatbots (ChatGPT 4o, Google Gemini, DeepSeek, and OpenEvidence) were asked 25 queries about microtia developed from the FAQ section on hospital websites. Responses were evaluated using modified DISCERN criteria for quality and SMOG scoring for readability. ANOVA and post hoc analyses were performed to identify significant differences.

Results: Google Gemini achieved the highest DISCERN score (M=37.16, SD=2.58), followed by OpenEvidence (M=32.19, SD=3.54). DeepSeek (M=30.76, SD=4.29) and ChatGPT (M=30.32, SD=2.97) had the lowest DISCERN scores. OpenEvidence had the worst readability (M=18.06, SD=1.12), followed by ChatGPT (M=16.32, SD=1.41). DeepSeek was the most readable (M=14.63, SD=1.60), closely followed by Google Gemini (M=14.73, SD=1.27). Overall, the average DISCERN and SMOG scores across all platforms were 32.19 (SD=4.43) and 15.93 (SD=1.94), respectively, indicating a good quality and an undergraduate reading level.

Conclusions: None of the platforms consistently met both quality and readability standards, though Google Gemini performed relatively well. As reliance on AI for early health information grows, ensuring the accessibility of chatbot responses will be crucial for supporting informed decision-making and enhancing the patient experience.

查看原文本刊更多论文

评估人工智能聊天机器人对小儿科患者教育的可靠性和可读性。

耳部小耳症是一种先天性畸形，可以从轻度发育不足到完全没有外耳。它通常是单侧的，导致明显的面部不对称，给患者和家属带来心理社会困扰。护理人员报告说，他们感到内疚和焦虑，而患者则感到抑郁和社交挑战的比例增加。对于患者及其家属来说，这通常是一段艰难的时期，他们经常在接受最终手术护理之前和之后求助于人工智能聊天机器人。这项研究评估了领先的基于人工智能的聊天机器人在回答以患者为中心的病情问题时的质量和可读性。方法：向流行的人工智能聊天机器人（ChatGPT 40、谷歌Gemini、DeepSeek和OpenEvidence）询问25个来自医院网站常见问题解答部分的关于微症的问题。使用修改后的DISCERN质量标准和烟雾可读性评分来评估反馈。进行方差分析和事后分析以确定显著差异。结果：谷歌Gemini获得最高的DISCERN评分（M=37.16， SD=2.58），其次是OpenEvidence （M=32.19， SD=3.54）。DeepSeek （M=30.76， SD=4.29）和ChatGPT （M=30.32， SD=2.97）的DISCERN得分最低。OpenEvidence的可读性最差（M=18.06， SD=1.12），其次是ChatGPT （M=16.32， SD=1.41）。最易读的是DeepSeek (M=14.63， SD=1.60)，其次是谷歌Gemini （M=14.73， SD=1.27）。总体而言，所有平台的平均DISCERN和SMOG得分分别为32.19 （SD=4.43）和15.93 (SD=1.94)，表明质量良好，具有本科阅读水平。结论：没有一个平台能够同时满足质量和可读性标准，尽管谷歌Gemini表现相对较好。随着对人工智能早期健康信息的依赖日益增加，确保聊天机器人响应的可访问性对于支持知情决策和改善患者体验至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Craniofacial Surgery 医学-外科

CiteScore

1.70

自引率

11.10%

发文量

968

审稿时长

1.5 months

期刊介绍： The Journal of Craniofacial Surgery serves as a forum of communication for all those involved in craniofacial surgery, maxillofacial surgery and pediatric plastic surgery. Coverage ranges from practical aspects of craniofacial surgery to the basic science that underlies surgical practice. The journal publishes original articles, scientific reviews, editorials and invited commentary, abstracts and selected articles from international journals, and occasional international bibliographies in craniofacial surgery.