人工智能聊天机器人对颌面修复常见问题回答的可读性和性能

IF 4.8 2区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE
Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Leila M Sears, Vinsensia Launardo, Nina Ariani, Nadine Ziad Mirza, Amanda Colebeck, Banu Karayazgan, Maribeth Krzesinski, Alvin G Wee, Cortino Sukotjo
{"title":"人工智能聊天机器人对颌面修复常见问题回答的可读性和性能","authors":"Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Leila M Sears, Vinsensia Launardo, Nina Ariani, Nadine Ziad Mirza, Amanda Colebeck, Banu Karayazgan, Maribeth Krzesinski, Alvin G Wee, Cortino Sukotjo","doi":"10.1016/j.prosdent.2025.09.009","DOIUrl":null,"url":null,"abstract":"<p><strong>Statement of problem: </strong>Patients seeking information about maxillofacial prosthodontic care increasingly turn to artificial intelligence (AI)-driven chatbots for guidance. However, the readability, accuracy, and clarity of these AI-generated responses have not been adequately evaluated within the context of maxillofacial prosthodontics.</p><p><strong>Purpose: </strong>The purpose of this study was to assess and compare the readability and performance of chatbot-generated responses to frequently asked questions about intraoral and extraoral maxillofacial prosthodontics.</p><p><strong>Material and methods: </strong>A total of 20 frequently asked intraoral and extraoral questions were collected from 7 maxillofacial prosthodontists. These questions were submitted to 4 AI chatbots: ChatGPT, Gemini, Copilot, and DeepSeek. A total of 80 responses were evaluated. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL). Seven maxillofacial prosthodontists were calibrated to score the chatbot responses on 5 domains, relevance, clarity, depth, focus, and coherence, using a 5-point scale. The obtained data were analyzed using 2-way ANOVA with post hoc Tukey tests, Pearson correlation analyses, and intraclass correlation coefficients (ICCs) (α=.05).</p><p><strong>Results: </strong>FKGL scores differed significantly among chatbots (P=.002). DeepSeek had the lowest FKGL, indicating better readability, while ChatGPT had the highest. Word counts, relevance, clarity, content depth, focus, and coherence varied significantly among platforms (P<.005). ChatGPT, Gemini, and DeepSeek consistently scored higher, while Copilot had the lowest scores across all domains. For questions on intraoral prostheses, FKGL scores negatively correlated with word count (P=.013). For questions on extraoral prostheses, word count positively correlated with all qualitative metrics except for FKGL (P<.005).</p><p><strong>Conclusions: </strong>Significant differences were found in both readability and response quality among commonly used AI chatbots. Although the DeepSeek and ChatGPT platforms produced higher-quality content, none consistently met health literacy guidelines. Clinician oversight is essential when using AI-generated materials to answer frequently asked questions by patients requiring maxillofacial prosthodontic care.</p>","PeriodicalId":16866,"journal":{"name":"Journal of Prosthetic Dentistry","volume":" ","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Readability and performance of AI chatbot responses to frequently asked questions in maxillofacial prosthodontics.\",\"authors\":\"Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Leila M Sears, Vinsensia Launardo, Nina Ariani, Nadine Ziad Mirza, Amanda Colebeck, Banu Karayazgan, Maribeth Krzesinski, Alvin G Wee, Cortino Sukotjo\",\"doi\":\"10.1016/j.prosdent.2025.09.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Statement of problem: </strong>Patients seeking information about maxillofacial prosthodontic care increasingly turn to artificial intelligence (AI)-driven chatbots for guidance. However, the readability, accuracy, and clarity of these AI-generated responses have not been adequately evaluated within the context of maxillofacial prosthodontics.</p><p><strong>Purpose: </strong>The purpose of this study was to assess and compare the readability and performance of chatbot-generated responses to frequently asked questions about intraoral and extraoral maxillofacial prosthodontics.</p><p><strong>Material and methods: </strong>A total of 20 frequently asked intraoral and extraoral questions were collected from 7 maxillofacial prosthodontists. These questions were submitted to 4 AI chatbots: ChatGPT, Gemini, Copilot, and DeepSeek. A total of 80 responses were evaluated. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL). Seven maxillofacial prosthodontists were calibrated to score the chatbot responses on 5 domains, relevance, clarity, depth, focus, and coherence, using a 5-point scale. The obtained data were analyzed using 2-way ANOVA with post hoc Tukey tests, Pearson correlation analyses, and intraclass correlation coefficients (ICCs) (α=.05).</p><p><strong>Results: </strong>FKGL scores differed significantly among chatbots (P=.002). DeepSeek had the lowest FKGL, indicating better readability, while ChatGPT had the highest. Word counts, relevance, clarity, content depth, focus, and coherence varied significantly among platforms (P<.005). ChatGPT, Gemini, and DeepSeek consistently scored higher, while Copilot had the lowest scores across all domains. For questions on intraoral prostheses, FKGL scores negatively correlated with word count (P=.013). For questions on extraoral prostheses, word count positively correlated with all qualitative metrics except for FKGL (P<.005).</p><p><strong>Conclusions: </strong>Significant differences were found in both readability and response quality among commonly used AI chatbots. Although the DeepSeek and ChatGPT platforms produced higher-quality content, none consistently met health literacy guidelines. Clinician oversight is essential when using AI-generated materials to answer frequently asked questions by patients requiring maxillofacial prosthodontic care.</p>\",\"PeriodicalId\":16866,\"journal\":{\"name\":\"Journal of Prosthetic Dentistry\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Prosthetic Dentistry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.prosdent.2025.09.009\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Prosthetic Dentistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.prosdent.2025.09.009","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

问题陈述:寻求颌面修复护理信息的患者越来越多地转向人工智能(AI)驱动的聊天机器人进行指导。然而,这些人工智能生成的响应的可读性、准确性和清晰度尚未在颌面修复学的背景下得到充分评估。目的:本研究的目的是评估和比较聊天机器人对口腔内和口腔外颌面修复的常见问题的回答的可读性和性能。材料与方法:收集7名口腔修复医师口腔内口腔外常见问题20个。这些问题被提交给4个人工智能聊天机器人:ChatGPT、Gemini、Copilot和DeepSeek。总共评估了80份回复。采用Flesch-Kincaid分级标准(FKGL)评定可读性。对7名颌面修复医生进行校准,使用5分制对聊天机器人的回答在相关性、清晰度、深度、焦点和连贯性5个方面进行评分。所得资料采用双因素方差分析、事后Tukey检验、Pearson相关分析和类内相关系数(ICCs) (α= 0.05)进行分析。结果:聊天机器人间FKGL评分差异有统计学意义(P= 0.002)。DeepSeek的FKGL最低,表明可读性更好,而ChatGPT的FKGL最高。字数、相关性、清晰度、内容深度、焦点和连贯性在不同平台之间存在显著差异(结论:常用人工智能聊天机器人在可读性和响应质量方面存在显著差异。尽管DeepSeek和ChatGPT平台产生了更高质量的内容,但没有一个平台始终符合健康素养指南。当使用人工智能生成的材料回答需要颌面修复护理的患者的常见问题时,临床医生的监督至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Readability and performance of AI chatbot responses to frequently asked questions in maxillofacial prosthodontics.

Statement of problem: Patients seeking information about maxillofacial prosthodontic care increasingly turn to artificial intelligence (AI)-driven chatbots for guidance. However, the readability, accuracy, and clarity of these AI-generated responses have not been adequately evaluated within the context of maxillofacial prosthodontics.

Purpose: The purpose of this study was to assess and compare the readability and performance of chatbot-generated responses to frequently asked questions about intraoral and extraoral maxillofacial prosthodontics.

Material and methods: A total of 20 frequently asked intraoral and extraoral questions were collected from 7 maxillofacial prosthodontists. These questions were submitted to 4 AI chatbots: ChatGPT, Gemini, Copilot, and DeepSeek. A total of 80 responses were evaluated. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL). Seven maxillofacial prosthodontists were calibrated to score the chatbot responses on 5 domains, relevance, clarity, depth, focus, and coherence, using a 5-point scale. The obtained data were analyzed using 2-way ANOVA with post hoc Tukey tests, Pearson correlation analyses, and intraclass correlation coefficients (ICCs) (α=.05).

Results: FKGL scores differed significantly among chatbots (P=.002). DeepSeek had the lowest FKGL, indicating better readability, while ChatGPT had the highest. Word counts, relevance, clarity, content depth, focus, and coherence varied significantly among platforms (P<.005). ChatGPT, Gemini, and DeepSeek consistently scored higher, while Copilot had the lowest scores across all domains. For questions on intraoral prostheses, FKGL scores negatively correlated with word count (P=.013). For questions on extraoral prostheses, word count positively correlated with all qualitative metrics except for FKGL (P<.005).

Conclusions: Significant differences were found in both readability and response quality among commonly used AI chatbots. Although the DeepSeek and ChatGPT platforms produced higher-quality content, none consistently met health literacy guidelines. Clinician oversight is essential when using AI-generated materials to answer frequently asked questions by patients requiring maxillofacial prosthodontic care.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Prosthetic Dentistry
Journal of Prosthetic Dentistry 医学-牙科与口腔外科
CiteScore
7.00
自引率
13.00%
发文量
599
审稿时长
69 days
期刊介绍: The Journal of Prosthetic Dentistry is the leading professional journal devoted exclusively to prosthetic and restorative dentistry. The Journal is the official publication for 24 leading U.S. international prosthodontic organizations. The monthly publication features timely, original peer-reviewed articles on the newest techniques, dental materials, and research findings. The Journal serves prosthodontists and dentists in advanced practice, and features color photos that illustrate many step-by-step procedures. The Journal of Prosthetic Dentistry is included in Index Medicus and CINAHL.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信