人工智能聊天机器人对颌面修复常见问题回答的可读性和性能

IF 4.8 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of Prosthetic Dentistry Pub Date : 2025-09-26 DOI:10.1016/j.prosdent.2025.09.009

Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Leila M Sears, Vinsensia Launardo, Nina Ariani, Nadine Ziad Mirza, Amanda Colebeck, Banu Karayazgan, Maribeth Krzesinski, Alvin G Wee, Cortino Sukotjo

{"title":"人工智能聊天机器人对颌面修复常见问题回答的可读性和性能","authors":"Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Leila M Sears, Vinsensia Launardo, Nina Ariani, Nadine Ziad Mirza, Amanda Colebeck, Banu Karayazgan, Maribeth Krzesinski, Alvin G Wee, Cortino Sukotjo","doi":"10.1016/j.prosdent.2025.09.009","DOIUrl":null,"url":null,"abstract":"Statement of problem: Patients seeking information about maxillofacial prosthodontic care increasingly turn to artificial intelligence (AI)-driven chatbots for guidance. However, the readability, accuracy, and clarity of these AI-generated responses have not been adequately evaluated within the context of maxillofacial prosthodontics.Purpose: The purpose of this study was to assess and compare the readability and performance of chatbot-generated responses to frequently asked questions about intraoral and extraoral maxillofacial prosthodontics.Material and methods: A total of 20 frequently asked intraoral and extraoral questions were collected from 7 maxillofacial prosthodontists. These questions were submitted to 4 AI chatbots: ChatGPT, Gemini, Copilot, and DeepSeek. A total of 80 responses were evaluated. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL). Seven maxillofacial prosthodontists were calibrated to score the chatbot responses on 5 domains, relevance, clarity, depth, focus, and coherence, using a 5-point scale. The obtained data were analyzed using 2-way ANOVA with post hoc Tukey tests, Pearson correlation analyses, and intraclass correlation coefficients (ICCs) (α=.05).Results: FKGL scores differed significantly among chatbots (P=.002). DeepSeek had the lowest FKGL, indicating better readability, while ChatGPT had the highest. Word counts, relevance, clarity, content depth, focus, and coherence varied significantly among platforms (P<.005). ChatGPT, Gemini, and DeepSeek consistently scored higher, while Copilot had the lowest scores across all domains. For questions on intraoral prostheses, FKGL scores negatively correlated with word count (P=.013). For questions on extraoral prostheses, word count positively correlated with all qualitative metrics except for FKGL (P<.005).Conclusions: Significant differences were found in both readability and response quality among commonly used AI chatbots. Although the DeepSeek and ChatGPT platforms produced higher-quality content, none consistently met health literacy guidelines. Clinician oversight is essential when using AI-generated materials to answer frequently asked questions by patients requiring maxillofacial prosthodontic care.","PeriodicalId":16866,"journal":{"name":"Journal of Prosthetic Dentistry","volume":" ","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Readability and performance of AI chatbot responses to frequently asked questions in maxillofacial prosthodontics.\",\"authors\":\"Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Leila M Sears, Vinsensia Launardo, Nina Ariani, Nadine Ziad Mirza, Amanda Colebeck, Banu Karayazgan, Maribeth Krzesinski, Alvin G Wee, Cortino Sukotjo\",\"doi\":\"10.1016/j.prosdent.2025.09.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Statement of problem: Patients seeking information about maxillofacial prosthodontic care increasingly turn to artificial intelligence (AI)-driven chatbots for guidance. However, the readability, accuracy, and clarity of these AI-generated responses have not been adequately evaluated within the context of maxillofacial prosthodontics.Purpose: The purpose of this study was to assess and compare the readability and performance of chatbot-generated responses to frequently asked questions about intraoral and extraoral maxillofacial prosthodontics.Material and methods: A total of 20 frequently asked intraoral and extraoral questions were collected from 7 maxillofacial prosthodontists. These questions were submitted to 4 AI chatbots: ChatGPT, Gemini, Copilot, and DeepSeek. A total of 80 responses were evaluated. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL). Seven maxillofacial prosthodontists were calibrated to score the chatbot responses on 5 domains, relevance, clarity, depth, focus, and coherence, using a 5-point scale. The obtained data were analyzed using 2-way ANOVA with post hoc Tukey tests, Pearson correlation analyses, and intraclass correlation coefficients (ICCs) (α=.05).Results: FKGL scores differed significantly among chatbots (P=.002). DeepSeek had the lowest FKGL, indicating better readability, while ChatGPT had the highest. Word counts, relevance, clarity, content depth, focus, and coherence varied significantly among platforms (P<.005). ChatGPT, Gemini, and DeepSeek consistently scored higher, while Copilot had the lowest scores across all domains. For questions on intraoral prostheses, FKGL scores negatively correlated with word count (P=.013). For questions on extraoral prostheses, word count positively correlated with all qualitative metrics except for FKGL (P<.005).Conclusions: Significant differences were found in both readability and response quality among commonly used AI chatbots. Although the DeepSeek and ChatGPT platforms produced higher-quality content, none consistently met health literacy guidelines. Clinician oversight is essential when using AI-generated materials to answer frequently asked questions by patients requiring maxillofacial prosthodontic care.\",\"PeriodicalId\":16866,\"journal\":{\"name\":\"Journal of Prosthetic Dentistry\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Prosthetic Dentistry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.prosdent.2025.09.009\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Prosthetic Dentistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.prosdent.2025.09.009","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

问题陈述：寻求颌面修复护理信息的患者越来越多地转向人工智能（AI）驱动的聊天机器人进行指导。然而，这些人工智能生成的响应的可读性、准确性和清晰度尚未在颌面修复学的背景下得到充分评估。目的：本研究的目的是评估和比较聊天机器人对口腔内和口腔外颌面修复的常见问题的回答的可读性和性能。材料与方法：收集7名口腔修复医师口腔内口腔外常见问题20个。这些问题被提交给4个人工智能聊天机器人：ChatGPT、Gemini、Copilot和DeepSeek。总共评估了80份回复。采用Flesch-Kincaid分级标准（FKGL）评定可读性。对7名颌面修复医生进行校准，使用5分制对聊天机器人的回答在相关性、清晰度、深度、焦点和连贯性5个方面进行评分。所得资料采用双因素方差分析、事后Tukey检验、Pearson相关分析和类内相关系数（ICCs）（α= 0.05）进行分析。结果：聊天机器人间FKGL评分差异有统计学意义（P= 0.002）。DeepSeek的FKGL最低，表明可读性更好，而ChatGPT的FKGL最高。字数、相关性、清晰度、内容深度、焦点和连贯性在不同平台之间存在显著差异(结论：常用人工智能聊天机器人在可读性和响应质量方面存在显著差异。尽管DeepSeek和ChatGPT平台产生了更高质量的内容，但没有一个平台始终符合健康素养指南。当使用人工智能生成的材料回答需要颌面修复护理的患者的常见问题时，临床医生的监督至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Readability and performance of AI chatbot responses to frequently asked questions in maxillofacial prosthodontics.

Statement of problem: Patients seeking information about maxillofacial prosthodontic care increasingly turn to artificial intelligence (AI)-driven chatbots for guidance. However, the readability, accuracy, and clarity of these AI-generated responses have not been adequately evaluated within the context of maxillofacial prosthodontics.

Purpose: The purpose of this study was to assess and compare the readability and performance of chatbot-generated responses to frequently asked questions about intraoral and extraoral maxillofacial prosthodontics.

Material and methods: A total of 20 frequently asked intraoral and extraoral questions were collected from 7 maxillofacial prosthodontists. These questions were submitted to 4 AI chatbots: ChatGPT, Gemini, Copilot, and DeepSeek. A total of 80 responses were evaluated. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL). Seven maxillofacial prosthodontists were calibrated to score the chatbot responses on 5 domains, relevance, clarity, depth, focus, and coherence, using a 5-point scale. The obtained data were analyzed using 2-way ANOVA with post hoc Tukey tests, Pearson correlation analyses, and intraclass correlation coefficients (ICCs) (α=.05).

Results: FKGL scores differed significantly among chatbots (P=.002). DeepSeek had the lowest FKGL, indicating better readability, while ChatGPT had the highest. Word counts, relevance, clarity, content depth, focus, and coherence varied significantly among platforms (P<.005). ChatGPT, Gemini, and DeepSeek consistently scored higher, while Copilot had the lowest scores across all domains. For questions on intraoral prostheses, FKGL scores negatively correlated with word count (P=.013). For questions on extraoral prostheses, word count positively correlated with all qualitative metrics except for FKGL (P<.005).

Conclusions: Significant differences were found in both readability and response quality among commonly used AI chatbots. Although the DeepSeek and ChatGPT platforms produced higher-quality content, none consistently met health literacy guidelines. Clinician oversight is essential when using AI-generated materials to answer frequently asked questions by patients requiring maxillofacial prosthodontic care.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Prosthetic Dentistry 医学-牙科与口腔外科

CiteScore

7.00

自引率

13.00%

发文量

599

审稿时长

69 days

期刊介绍： The Journal of Prosthetic Dentistry is the leading professional journal devoted exclusively to prosthetic and restorative dentistry. The Journal is the official publication for 24 leading U.S. international prosthodontic organizations. The monthly publication features timely, original peer-reviewed articles on the newest techniques, dental materials, and research findings. The Journal serves prosthodontists and dentists in advanced practice, and features color photos that illustrate many step-by-step procedures. The Journal of Prosthetic Dentistry is included in Index Medicus and CINAHL.