评估大型语言模型在解决牙髓疼痛患者问题：可访问聊天机器人的比较分析。

IF 3.6 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of endodontics Pub Date : 2025-05-05 DOI:10.1016/j.joen.2025.04.015

Sanaa Aljamani, Yazan Hassona, Hoda A Fansa, Hiba M Saadeh, Kifah Dafi Jamani

{"title":"评估大型语言模型在解决牙髓疼痛患者问题：可访问聊天机器人的比较分析。","authors":"Sanaa Aljamani, Yazan Hassona, Hoda A Fansa, Hiba M Saadeh, Kifah Dafi Jamani","doi":"10.1016/j.joen.2025.04.015","DOIUrl":null,"url":null,"abstract":"Introduction: Patients increasingly use large language models for health-related information, but their reliability and usefulness remain controversial. Continuous assessment is essential to evaluate their role in patient education. This study evaluates the performance of ChatGPT-3.5 and Gemini in answering patient inquiries about endodontic pain.Methods: A total of 62 frequently asked questions on endodontic pain were categorized into etiology, symptoms, management, and incidence. Responses from ChatGPT 3.5 and Gemini were assessed using standardized tools, including the Global Quality Score (GQS), Completeness, Lack of false information, Evidence supported, Appropriateness and Relevance reliability tool, and readability indices (Flesch-Kincaid and Simple Measure of Gobbledygook).Results: Compared to Gemini, ChatGPT 3.5 responses scored significantly higher in terms of overall quality (GQS: 4.67-4.9 vs 2.5-4, P < .001) and reliability (Completeness, Lack of false information, Evidence supported, Appropriateness and Relevance: 23.5-23.6 vs 19.35-22.7, P < .05). However, it required a higher reading level (Simple Measure of Gobbledygook: 14-17.6) compared to Gemini (8.7-11.3, P < .001). Gemini's responses were more readable (6th-7th grade level) but lacked depth and completeness.Conclusion: While ChatGPT 3.5 outperformed Gemini in quality and reliability, its complex language reduced accessibility. In contrast, Gemini's simpler language enhanced readability but sacrificed comprehensiveness. These findings highlight the need for professional oversight in integrating artificial intelligence-driven tools into healthcare communication to ensure accurate, accessible, and empathetic patient education.","PeriodicalId":15703,"journal":{"name":"Journal of endodontics","volume":" ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Large Language Models in Addressing Patient Questions on Endodontic Pain: A Comparative Analysis of Accessible Chatbots.\",\"authors\":\"Sanaa Aljamani, Yazan Hassona, Hoda A Fansa, Hiba M Saadeh, Kifah Dafi Jamani\",\"doi\":\"10.1016/j.joen.2025.04.015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Patients increasingly use large language models for health-related information, but their reliability and usefulness remain controversial. Continuous assessment is essential to evaluate their role in patient education. This study evaluates the performance of ChatGPT-3.5 and Gemini in answering patient inquiries about endodontic pain.Methods: A total of 62 frequently asked questions on endodontic pain were categorized into etiology, symptoms, management, and incidence. Responses from ChatGPT 3.5 and Gemini were assessed using standardized tools, including the Global Quality Score (GQS), Completeness, Lack of false information, Evidence supported, Appropriateness and Relevance reliability tool, and readability indices (Flesch-Kincaid and Simple Measure of Gobbledygook).Results: Compared to Gemini, ChatGPT 3.5 responses scored significantly higher in terms of overall quality (GQS: 4.67-4.9 vs 2.5-4, P < .001) and reliability (Completeness, Lack of false information, Evidence supported, Appropriateness and Relevance: 23.5-23.6 vs 19.35-22.7, P < .05). However, it required a higher reading level (Simple Measure of Gobbledygook: 14-17.6) compared to Gemini (8.7-11.3, P < .001). Gemini's responses were more readable (6th-7th grade level) but lacked depth and completeness.Conclusion: While ChatGPT 3.5 outperformed Gemini in quality and reliability, its complex language reduced accessibility. In contrast, Gemini's simpler language enhanced readability but sacrificed comprehensiveness. These findings highlight the need for professional oversight in integrating artificial intelligence-driven tools into healthcare communication to ensure accurate, accessible, and empathetic patient education.\",\"PeriodicalId\":15703,\"journal\":{\"name\":\"Journal of endodontics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of endodontics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.joen.2025.04.015\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endodontics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.joen.2025.04.015","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

背景：患者越来越多地使用大型语言模型（llm）来获取健康相关信息，但其可靠性和实用性仍然存在争议。持续的评估对于评估他们在患者教育中的作用至关重要。目的：本研究评估ChatGPT 3.5和Gemini在回答患者对牙髓疼痛的询问方面的表现。方法：对髓质疼痛的62个常见问题进行病因、症状、处理和发生率分类。ChatGPT 3.5和Gemini的回复使用标准化工具进行评估，包括全球质量量表（GQS）、CLEAR可靠性工具和可读性指数（Flesch-Kincaid和SMOG）。结果：与Gemini相比，ChatGPT 3.5在总体质量（GQS: 4.67-4.9 vs. 2.5-4, p < 0.001）和可靠性（CLEAR: 23.5-23.6 vs. 19.35-22.7, p < 0.05）方面得分明显更高。然而，与双子座（8.7-11.3,p < 0.001）相比，它需要更高的阅读水平（烟雾：14-17.6）。双子座的回答可读性更强（6 -7年级的水平），但缺乏深度和完整性。结论：虽然ChatGPT 3.5在质量和可靠性上优于Gemini，但其复杂的语言降低了可访问性。相比之下，双子座的简单语言增强了可读性，但牺牲了全面性。这些发现强调了在将人工智能驱动的工具整合到医疗保健沟通中，需要进行专业监督，以确保对患者进行准确、可访问和同理心的教育。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating Large Language Models in Addressing Patient Questions on Endodontic Pain: A Comparative Analysis of Accessible Chatbots.

Introduction: Patients increasingly use large language models for health-related information, but their reliability and usefulness remain controversial. Continuous assessment is essential to evaluate their role in patient education. This study evaluates the performance of ChatGPT-3.5 and Gemini in answering patient inquiries about endodontic pain.

Methods: A total of 62 frequently asked questions on endodontic pain were categorized into etiology, symptoms, management, and incidence. Responses from ChatGPT 3.5 and Gemini were assessed using standardized tools, including the Global Quality Score (GQS), Completeness, Lack of false information, Evidence supported, Appropriateness and Relevance reliability tool, and readability indices (Flesch-Kincaid and Simple Measure of Gobbledygook).

Results: Compared to Gemini, ChatGPT 3.5 responses scored significantly higher in terms of overall quality (GQS: 4.67-4.9 vs 2.5-4, P < .001) and reliability (Completeness, Lack of false information, Evidence supported, Appropriateness and Relevance: 23.5-23.6 vs 19.35-22.7, P < .05). However, it required a higher reading level (Simple Measure of Gobbledygook: 14-17.6) compared to Gemini (8.7-11.3, P < .001). Gemini's responses were more readable (6th-7th grade level) but lacked depth and completeness.

Conclusion: While ChatGPT 3.5 outperformed Gemini in quality and reliability, its complex language reduced accessibility. In contrast, Gemini's simpler language enhanced readability but sacrificed comprehensiveness. These findings highlight the need for professional oversight in integrating artificial intelligence-driven tools into healthcare communication to ensure accurate, accessible, and empathetic patient education.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of endodontics 医学-牙科与口腔外科

CiteScore

8.80

自引率

9.50%

发文量

224

审稿时长

42 days

期刊介绍： The Journal of Endodontics, the official journal of the American Association of Endodontists, publishes scientific articles, case reports and comparison studies evaluating materials and methods of pulp conservation and endodontic treatment. Endodontists and general dentists can learn about new concepts in root canal treatment and the latest advances in techniques and instrumentation in the one journal that helps them keep pace with rapid changes in this field.