Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Heidi Marie Huber, Atousa Azarbal, Sri Kurniawan, Cortino Sukotjo
{"title":"评估美国口腔修复医师学会制作的内容的可读性和准确性,以及用于口腔修复患者教育的大型语言模型。","authors":"Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Heidi Marie Huber, Atousa Azarbal, Sri Kurniawan, Cortino Sukotjo","doi":"10.1111/jopr.70022","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study aims to evaluate the readability and accuracy of content produced by ChatGPT, Copilot, Gemini, and the American College of Prosthodontists (ACP) for patient education in prosthodontics.</p><p><strong>Materials and methods: </strong>A series of 26 questions were selected from the ACP's list of questions (GoToAPro.org FAQs) and their published answers. Answers to the same questions were generated from ChatGPT-3.5, Copilot, and Gemini. The word counts of responses from chatbots and the ACP were recorded. The readability was calculated using the Flesch Reading Ease Scale and Flesch-Kincaid Grade Level. The responses were also evaluated for accuracy, completeness, and overall quality. Descriptive statistics were used to calculate mean and standard deviations (SD). One-way analysis of variance was performed, followed by the Tukey multiple comparisons to test differences across chatbots, ACP, and various selected topics. The Pearson correlation coefficient was used to examine the relationship between each variable. Significance was set at α < 0.05.</p><p><strong>Results: </strong>ChatGPT had a higher word count, while ACP had a lower word count (p < 0.001). The cumulative scores of the prosthodontist topic had the lowest Flesch Reading Ease Scale score, while brushing and flossing topics displayed the highest score (p < 0.001). Brushing and flossing topics also had the lowest Flesch-Kincaid Grade Level score, whereas the prosthodontist topic had the highest score (p < 0.001). Accuracy for denture topics was the lowest across the chatbots and ACP, and it was the highest for brushing and flossing topics (p = 0.006).</p><p><strong>Conclusions: </strong>This study highlights the potential for large language models to enhance patient's prosthodontic education. However, the variability in readability and accuracy across platforms underscores the need for dental professionals to critically evaluate the content generated by these tools before recommending them to patients.</p>","PeriodicalId":49152,"journal":{"name":"Journal of Prosthodontics-Implant Esthetic and Reconstructive Dentistry","volume":" ","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing readability and accuracy of content produced by the American College of Prosthodontists and large language models for patient education in prosthodontics.\",\"authors\":\"Soni Prasad, Merve Koseoglu, Stavroula Antonopoulou, Heidi Marie Huber, Atousa Azarbal, Sri Kurniawan, Cortino Sukotjo\",\"doi\":\"10.1111/jopr.70022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>This study aims to evaluate the readability and accuracy of content produced by ChatGPT, Copilot, Gemini, and the American College of Prosthodontists (ACP) for patient education in prosthodontics.</p><p><strong>Materials and methods: </strong>A series of 26 questions were selected from the ACP's list of questions (GoToAPro.org FAQs) and their published answers. Answers to the same questions were generated from ChatGPT-3.5, Copilot, and Gemini. The word counts of responses from chatbots and the ACP were recorded. The readability was calculated using the Flesch Reading Ease Scale and Flesch-Kincaid Grade Level. The responses were also evaluated for accuracy, completeness, and overall quality. Descriptive statistics were used to calculate mean and standard deviations (SD). One-way analysis of variance was performed, followed by the Tukey multiple comparisons to test differences across chatbots, ACP, and various selected topics. The Pearson correlation coefficient was used to examine the relationship between each variable. Significance was set at α < 0.05.</p><p><strong>Results: </strong>ChatGPT had a higher word count, while ACP had a lower word count (p < 0.001). The cumulative scores of the prosthodontist topic had the lowest Flesch Reading Ease Scale score, while brushing and flossing topics displayed the highest score (p < 0.001). Brushing and flossing topics also had the lowest Flesch-Kincaid Grade Level score, whereas the prosthodontist topic had the highest score (p < 0.001). Accuracy for denture topics was the lowest across the chatbots and ACP, and it was the highest for brushing and flossing topics (p = 0.006).</p><p><strong>Conclusions: </strong>This study highlights the potential for large language models to enhance patient's prosthodontic education. However, the variability in readability and accuracy across platforms underscores the need for dental professionals to critically evaluate the content generated by these tools before recommending them to patients.</p>\",\"PeriodicalId\":49152,\"journal\":{\"name\":\"Journal of Prosthodontics-Implant Esthetic and Reconstructive Dentistry\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Prosthodontics-Implant Esthetic and Reconstructive Dentistry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/jopr.70022\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Prosthodontics-Implant Esthetic and Reconstructive Dentistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/jopr.70022","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
Assessing readability and accuracy of content produced by the American College of Prosthodontists and large language models for patient education in prosthodontics.
Purpose: This study aims to evaluate the readability and accuracy of content produced by ChatGPT, Copilot, Gemini, and the American College of Prosthodontists (ACP) for patient education in prosthodontics.
Materials and methods: A series of 26 questions were selected from the ACP's list of questions (GoToAPro.org FAQs) and their published answers. Answers to the same questions were generated from ChatGPT-3.5, Copilot, and Gemini. The word counts of responses from chatbots and the ACP were recorded. The readability was calculated using the Flesch Reading Ease Scale and Flesch-Kincaid Grade Level. The responses were also evaluated for accuracy, completeness, and overall quality. Descriptive statistics were used to calculate mean and standard deviations (SD). One-way analysis of variance was performed, followed by the Tukey multiple comparisons to test differences across chatbots, ACP, and various selected topics. The Pearson correlation coefficient was used to examine the relationship between each variable. Significance was set at α < 0.05.
Results: ChatGPT had a higher word count, while ACP had a lower word count (p < 0.001). The cumulative scores of the prosthodontist topic had the lowest Flesch Reading Ease Scale score, while brushing and flossing topics displayed the highest score (p < 0.001). Brushing and flossing topics also had the lowest Flesch-Kincaid Grade Level score, whereas the prosthodontist topic had the highest score (p < 0.001). Accuracy for denture topics was the lowest across the chatbots and ACP, and it was the highest for brushing and flossing topics (p = 0.006).
Conclusions: This study highlights the potential for large language models to enhance patient's prosthodontic education. However, the variability in readability and accuracy across platforms underscores the need for dental professionals to critically evaluate the content generated by these tools before recommending them to patients.
期刊介绍:
The Journal of Prosthodontics promotes the advanced study and practice of prosthodontics, implant, esthetic, and reconstructive dentistry. It is the official journal of the American College of Prosthodontists, the American Dental Association-recognized voice of the Specialty of Prosthodontics. The journal publishes evidence-based original scientific articles presenting information that is relevant and useful to prosthodontists. Additionally, it publishes reports of innovative techniques, new instructional methodologies, and instructive clinical reports with an interdisciplinary flair. The journal is particularly focused on promoting the study and use of cutting-edge technology and positioning prosthodontists as the early-adopters of new technology in the dental community.