Assessing readability and accuracy of content produced by the American College of Prosthodontists and large language models for patient education in prosthodontics.

IF 3.4 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of Prosthodontics-Implant Esthetic and Reconstructive Dentistry Pub Date : 2025-08-22 DOI:10.1111/jopr.70022

Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Heidi Marie Huber, Atousa Azarbal, Sri Kurniawan, Cortino Sukotjo

{"title":"Assessing readability and accuracy of content produced by the American College of Prosthodontists and large language models for patient education in prosthodontics.","authors":"Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Heidi Marie Huber, Atousa Azarbal, Sri Kurniawan, Cortino Sukotjo","doi":"10.1111/jopr.70022","DOIUrl":null,"url":null,"abstract":"Purpose: This study aims to evaluate the readability and accuracy of content produced by ChatGPT, Copilot, Gemini, and the American College of Prosthodontists (ACP) for patient education in prosthodontics.Materials and methods: A series of 26 questions were selected from the ACP's list of questions (GoToAPro.org FAQs) and their published answers. Answers to the same questions were generated from ChatGPT-3.5, Copilot, and Gemini. The word counts of responses from chatbots and the ACP were recorded. The readability was calculated using the Flesch Reading Ease Scale and Flesch-Kincaid Grade Level. The responses were also evaluated for accuracy, completeness, and overall quality. Descriptive statistics were used to calculate mean and standard deviations (SD). One-way analysis of variance was performed, followed by the Tukey multiple comparisons to test differences across chatbots, ACP, and various selected topics. The Pearson correlation coefficient was used to examine the relationship between each variable. Significance was set at α < 0.05.Results: ChatGPT had a higher word count, while ACP had a lower word count (p < 0.001). The cumulative scores of the prosthodontist topic had the lowest Flesch Reading Ease Scale score, while brushing and flossing topics displayed the highest score (p < 0.001). Brushing and flossing topics also had the lowest Flesch-Kincaid Grade Level score, whereas the prosthodontist topic had the highest score (p < 0.001). Accuracy for denture topics was the lowest across the chatbots and ACP, and it was the highest for brushing and flossing topics (p = 0.006).Conclusions: This study highlights the potential for large language models to enhance patient's prosthodontic education. However, the variability in readability and accuracy across platforms underscores the need for dental professionals to critically evaluate the content generated by these tools before recommending them to patients.","PeriodicalId":49152,"journal":{"name":"Journal of Prosthodontics-Implant Esthetic and Reconstructive Dentistry","volume":" ","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Prosthodontics-Implant Esthetic and Reconstructive Dentistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/jopr.70022","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: This study aims to evaluate the readability and accuracy of content produced by ChatGPT, Copilot, Gemini, and the American College of Prosthodontists (ACP) for patient education in prosthodontics.

Materials and methods: A series of 26 questions were selected from the ACP's list of questions (GoToAPro.org FAQs) and their published answers. Answers to the same questions were generated from ChatGPT-3.5, Copilot, and Gemini. The word counts of responses from chatbots and the ACP were recorded. The readability was calculated using the Flesch Reading Ease Scale and Flesch-Kincaid Grade Level. The responses were also evaluated for accuracy, completeness, and overall quality. Descriptive statistics were used to calculate mean and standard deviations (SD). One-way analysis of variance was performed, followed by the Tukey multiple comparisons to test differences across chatbots, ACP, and various selected topics. The Pearson correlation coefficient was used to examine the relationship between each variable. Significance was set at α < 0.05.

Results: ChatGPT had a higher word count, while ACP had a lower word count (p < 0.001). The cumulative scores of the prosthodontist topic had the lowest Flesch Reading Ease Scale score, while brushing and flossing topics displayed the highest score (p < 0.001). Brushing and flossing topics also had the lowest Flesch-Kincaid Grade Level score, whereas the prosthodontist topic had the highest score (p < 0.001). Accuracy for denture topics was the lowest across the chatbots and ACP, and it was the highest for brushing and flossing topics (p = 0.006).

Conclusions: This study highlights the potential for large language models to enhance patient's prosthodontic education. However, the variability in readability and accuracy across platforms underscores the need for dental professionals to critically evaluate the content generated by these tools before recommending them to patients.

查看原文本刊更多论文

评估美国口腔修复医师学会制作的内容的可读性和准确性，以及用于口腔修复患者教育的大型语言模型。

目的：本研究旨在评估ChatGPT、Copilot、Gemini和美国口腔修复医师学会（American College of Prosthodontists， ACP）制作的内容在患者口腔修复教育中的可读性和准确性。材料和方法：从ACP的问题列表（GoToAPro.org faq）及其公布的答案中选择了一系列26个问题。同样问题的答案是由ChatGPT-3.5、Copilot和Gemini生成的。记录了聊天机器人和ACP的回答字数。可读性采用Flesch Reading Ease Scale和Flesch- kincaid Grade Level进行计算。还对回答的准确性、完整性和总体质量进行了评估。采用描述性统计方法计算均数和标准差（SD）。进行了单向方差分析，然后进行了Tukey多重比较，以测试聊天机器人、ACP和各种选定主题之间的差异。Pearson相关系数用于检验各变量之间的关系。显著性设为α < 0.05。结果：ChatGPT的字数较高，而ACP的字数较低（p < 0.001）。口腔修复医师的累积得分最低，刷牙和牙线的累积得分最高（p < 0.001）。刷牙和使用牙线的话题也有最低的Flesch-Kincaid Grade Level得分，而修复牙医的话题得分最高（p < 0.001）。在聊天机器人和ACP中，假牙主题的准确性最低，而刷牙和牙线主题的准确性最高（p = 0.006）。结论：本研究强调了大型语言模型在提高患者义齿教育方面的潜力。然而，不同平台在可读性和准确性上的差异强调了牙科专业人员在向患者推荐这些工具之前，需要对这些工具生成的内容进行批判性评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Prosthodontics-Implant Esthetic and Reconstructive Dentistry DENTISTRY, ORAL SURGERY & MEDICINE-

CiteScore

7.90

自引率

15.00%

发文量

171

审稿时长

6-12 weeks

期刊介绍： The Journal of Prosthodontics promotes the advanced study and practice of prosthodontics, implant, esthetic, and reconstructive dentistry. It is the official journal of the American College of Prosthodontists, the American Dental Association-recognized voice of the Specialty of Prosthodontics. The journal publishes evidence-based original scientific articles presenting information that is relevant and useful to prosthodontists. Additionally, it publishes reports of innovative techniques, new instructional methodologies, and instructive clinical reports with an interdisciplinary flair. The journal is particularly focused on promoting the study and use of cutting-edge technology and positioning prosthodontists as the early-adopters of new technology in the dental community.