Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Leila M Sears, Vinsensia Launardo, Nina Ariani, Nadine Ziad Mirza, Amanda Colebeck, Banu Karayazgan, Maribeth Krzesinski, Alvin G Wee, Cortino Sukotjo
{"title":"人工智能聊天机器人对颌面修复常见问题回答的可读性和性能","authors":"Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Leila M Sears, Vinsensia Launardo, Nina Ariani, Nadine Ziad Mirza, Amanda Colebeck, Banu Karayazgan, Maribeth Krzesinski, Alvin G Wee, Cortino Sukotjo","doi":"10.1016/j.prosdent.2025.09.009","DOIUrl":null,"url":null,"abstract":"<p><strong>Statement of problem: </strong>Patients seeking information about maxillofacial prosthodontic care increasingly turn to artificial intelligence (AI)-driven chatbots for guidance. However, the readability, accuracy, and clarity of these AI-generated responses have not been adequately evaluated within the context of maxillofacial prosthodontics.</p><p><strong>Purpose: </strong>The purpose of this study was to assess and compare the readability and performance of chatbot-generated responses to frequently asked questions about intraoral and extraoral maxillofacial prosthodontics.</p><p><strong>Material and methods: </strong>A total of 20 frequently asked intraoral and extraoral questions were collected from 7 maxillofacial prosthodontists. These questions were submitted to 4 AI chatbots: ChatGPT, Gemini, Copilot, and DeepSeek. A total of 80 responses were evaluated. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL). Seven maxillofacial prosthodontists were calibrated to score the chatbot responses on 5 domains, relevance, clarity, depth, focus, and coherence, using a 5-point scale. The obtained data were analyzed using 2-way ANOVA with post hoc Tukey tests, Pearson correlation analyses, and intraclass correlation coefficients (ICCs) (α=.05).</p><p><strong>Results: </strong>FKGL scores differed significantly among chatbots (P=.002). DeepSeek had the lowest FKGL, indicating better readability, while ChatGPT had the highest. Word counts, relevance, clarity, content depth, focus, and coherence varied significantly among platforms (P<.005). ChatGPT, Gemini, and DeepSeek consistently scored higher, while Copilot had the lowest scores across all domains. For questions on intraoral prostheses, FKGL scores negatively correlated with word count (P=.013). For questions on extraoral prostheses, word count positively correlated with all qualitative metrics except for FKGL (P<.005).</p><p><strong>Conclusions: </strong>Significant differences were found in both readability and response quality among commonly used AI chatbots. Although the DeepSeek and ChatGPT platforms produced higher-quality content, none consistently met health literacy guidelines. Clinician oversight is essential when using AI-generated materials to answer frequently asked questions by patients requiring maxillofacial prosthodontic care.</p>","PeriodicalId":16866,"journal":{"name":"Journal of Prosthetic Dentistry","volume":" ","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Readability and performance of AI chatbot responses to frequently asked questions in maxillofacial prosthodontics.\",\"authors\":\"Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Leila M Sears, Vinsensia Launardo, Nina Ariani, Nadine Ziad Mirza, Amanda Colebeck, Banu Karayazgan, Maribeth Krzesinski, Alvin G Wee, Cortino Sukotjo\",\"doi\":\"10.1016/j.prosdent.2025.09.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Statement of problem: </strong>Patients seeking information about maxillofacial prosthodontic care increasingly turn to artificial intelligence (AI)-driven chatbots for guidance. However, the readability, accuracy, and clarity of these AI-generated responses have not been adequately evaluated within the context of maxillofacial prosthodontics.</p><p><strong>Purpose: </strong>The purpose of this study was to assess and compare the readability and performance of chatbot-generated responses to frequently asked questions about intraoral and extraoral maxillofacial prosthodontics.</p><p><strong>Material and methods: </strong>A total of 20 frequently asked intraoral and extraoral questions were collected from 7 maxillofacial prosthodontists. These questions were submitted to 4 AI chatbots: ChatGPT, Gemini, Copilot, and DeepSeek. A total of 80 responses were evaluated. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL). Seven maxillofacial prosthodontists were calibrated to score the chatbot responses on 5 domains, relevance, clarity, depth, focus, and coherence, using a 5-point scale. The obtained data were analyzed using 2-way ANOVA with post hoc Tukey tests, Pearson correlation analyses, and intraclass correlation coefficients (ICCs) (α=.05).</p><p><strong>Results: </strong>FKGL scores differed significantly among chatbots (P=.002). DeepSeek had the lowest FKGL, indicating better readability, while ChatGPT had the highest. Word counts, relevance, clarity, content depth, focus, and coherence varied significantly among platforms (P<.005). ChatGPT, Gemini, and DeepSeek consistently scored higher, while Copilot had the lowest scores across all domains. For questions on intraoral prostheses, FKGL scores negatively correlated with word count (P=.013). For questions on extraoral prostheses, word count positively correlated with all qualitative metrics except for FKGL (P<.005).</p><p><strong>Conclusions: </strong>Significant differences were found in both readability and response quality among commonly used AI chatbots. Although the DeepSeek and ChatGPT platforms produced higher-quality content, none consistently met health literacy guidelines. Clinician oversight is essential when using AI-generated materials to answer frequently asked questions by patients requiring maxillofacial prosthodontic care.</p>\",\"PeriodicalId\":16866,\"journal\":{\"name\":\"Journal of Prosthetic Dentistry\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Prosthetic Dentistry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.prosdent.2025.09.009\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Prosthetic Dentistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.prosdent.2025.09.009","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
Readability and performance of AI chatbot responses to frequently asked questions in maxillofacial prosthodontics.
Statement of problem: Patients seeking information about maxillofacial prosthodontic care increasingly turn to artificial intelligence (AI)-driven chatbots for guidance. However, the readability, accuracy, and clarity of these AI-generated responses have not been adequately evaluated within the context of maxillofacial prosthodontics.
Purpose: The purpose of this study was to assess and compare the readability and performance of chatbot-generated responses to frequently asked questions about intraoral and extraoral maxillofacial prosthodontics.
Material and methods: A total of 20 frequently asked intraoral and extraoral questions were collected from 7 maxillofacial prosthodontists. These questions were submitted to 4 AI chatbots: ChatGPT, Gemini, Copilot, and DeepSeek. A total of 80 responses were evaluated. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL). Seven maxillofacial prosthodontists were calibrated to score the chatbot responses on 5 domains, relevance, clarity, depth, focus, and coherence, using a 5-point scale. The obtained data were analyzed using 2-way ANOVA with post hoc Tukey tests, Pearson correlation analyses, and intraclass correlation coefficients (ICCs) (α=.05).
Results: FKGL scores differed significantly among chatbots (P=.002). DeepSeek had the lowest FKGL, indicating better readability, while ChatGPT had the highest. Word counts, relevance, clarity, content depth, focus, and coherence varied significantly among platforms (P<.005). ChatGPT, Gemini, and DeepSeek consistently scored higher, while Copilot had the lowest scores across all domains. For questions on intraoral prostheses, FKGL scores negatively correlated with word count (P=.013). For questions on extraoral prostheses, word count positively correlated with all qualitative metrics except for FKGL (P<.005).
Conclusions: Significant differences were found in both readability and response quality among commonly used AI chatbots. Although the DeepSeek and ChatGPT platforms produced higher-quality content, none consistently met health literacy guidelines. Clinician oversight is essential when using AI-generated materials to answer frequently asked questions by patients requiring maxillofacial prosthodontic care.
期刊介绍:
The Journal of Prosthetic Dentistry is the leading professional journal devoted exclusively to prosthetic and restorative dentistry. The Journal is the official publication for 24 leading U.S. international prosthodontic organizations. The monthly publication features timely, original peer-reviewed articles on the newest techniques, dental materials, and research findings. The Journal serves prosthodontists and dentists in advanced practice, and features color photos that illustrate many step-by-step procedures. The Journal of Prosthetic Dentistry is included in Index Medicus and CINAHL.