Assessing the accuracy and reproducibility of ChatGPT for responding to patient inquiries about otosclerosis.

IF 1.9 3区医学 Q2 OTORHINOLARYNGOLOGY

European Archives of Oto-Rhino-Laryngology Pub Date : 2024-10-26 DOI:10.1007/s00405-024-09039-4

Utku Mete, Ömer Afşın Özmen

{"title":"Assessing the accuracy and reproducibility of ChatGPT for responding to patient inquiries about otosclerosis.","authors":"Utku Mete, Ömer Afşın Özmen","doi":"10.1007/s00405-024-09039-4","DOIUrl":null,"url":null,"abstract":"Background: Patients increasingly use chatbots powered by artificial intelligence to seek information. However, there is a lack of reliable studies on the accuracy and reproducibility of the information provided by these models. Therefore, we conducted a study investigating the ChatGPT's responses to questions about otosclerosis.Methods: 96 otosclerosis-related questions were collected from internet searches and websites of professional institutions and societies. Questions are divided into four sub-categories. These questions were directed at the latest version of ChatGPT Plus, and these responses were assessed by two otorhinolaryngology surgeons. Accuracy was graded as correct, incomplete, mixed, and irrelevant. Reproducibility was evaluated by comparing the consistency of the two answers to each specific question.Results: The overall accuracy and reproducibility rates of GPT-4o for correct answers were found to be 64.60% and 89.60%, respectively. The findings showed correct answers for accuracy and reproducibility for basic knowledge were 64.70% and 91.20%; for diagnosis & management, 64.0% and 92.0%; for medical & surgical treatment, 52.95% and 82.35%; and for operative risks & postoperative period, 75.0% and 90.0%, respectively. There were no significant differences found between the answers and groups in terms of accuracy and reproducibility (p = 0.073 and p = 0.752, respectively).Conclusion: GPT-4o achieved satisfactory accuracy results, except in the diagnosis & management and medical & surgical treatment categories. Reproducibility was generally high across all categories. With the audio and visual communication capabilities of GPT-4o, under the supervision of a medical professional, this model can be utilized to provide medical information and support for otosclerosis patients.","PeriodicalId":11952,"journal":{"name":"European Archives of Oto-Rhino-Laryngology","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Archives of Oto-Rhino-Laryngology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00405-024-09039-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Patients increasingly use chatbots powered by artificial intelligence to seek information. However, there is a lack of reliable studies on the accuracy and reproducibility of the information provided by these models. Therefore, we conducted a study investigating the ChatGPT's responses to questions about otosclerosis.

Methods: 96 otosclerosis-related questions were collected from internet searches and websites of professional institutions and societies. Questions are divided into four sub-categories. These questions were directed at the latest version of ChatGPT Plus, and these responses were assessed by two otorhinolaryngology surgeons. Accuracy was graded as correct, incomplete, mixed, and irrelevant. Reproducibility was evaluated by comparing the consistency of the two answers to each specific question.

Results: The overall accuracy and reproducibility rates of GPT-4o for correct answers were found to be 64.60% and 89.60%, respectively. The findings showed correct answers for accuracy and reproducibility for basic knowledge were 64.70% and 91.20%; for diagnosis & management, 64.0% and 92.0%; for medical & surgical treatment, 52.95% and 82.35%; and for operative risks & postoperative period, 75.0% and 90.0%, respectively. There were no significant differences found between the answers and groups in terms of accuracy and reproducibility (p = 0.073 and p = 0.752, respectively).

Conclusion: GPT-4o achieved satisfactory accuracy results, except in the diagnosis & management and medical & surgical treatment categories. Reproducibility was generally high across all categories. With the audio and visual communication capabilities of GPT-4o, under the supervision of a medical professional, this model can be utilized to provide medical information and support for otosclerosis patients.

查看原文本刊更多论文

评估用于回复患者有关耳硬化症咨询的 ChatGPT 的准确性和可重复性。

背景患者越来越多地使用人工智能聊天机器人来寻求信息。然而，关于这些模型所提供信息的准确性和可重复性却缺乏可靠的研究。因此，我们进行了一项研究，调查 ChatGPT 对耳硬化症相关问题的回答。方法：我们从互联网搜索以及专业机构和学会的网站上收集了 96 个耳硬化症相关问题。问题分为四个子类别。这些问题针对最新版的 ChatGPT Plus，由两名耳鼻喉科外科医生对这些回答进行评估。准确性分为正确、不完整、混合和无关四个等级。再现性是通过比较每个具体问题的两个答案的一致性来评估的：结果：GPT-4o 正确答案的总体准确率和再现率分别为 64.60% 和 89.60%。结果显示，基础知识正确答案的准确率和再现率分别为 64.70% 和 91.20%；诊断和管理正确答案的准确率和再现率分别为 64.0% 和 92.0%；内外科治疗正确答案的准确率和再现率分别为 52.95% 和 82.35%；手术风险和术后正确答案的准确率和再现率分别为 75.0% 和 90.0%。在准确性和可重复性方面，答案和组别之间没有发现明显差异（分别为 p = 0.073 和 p = 0.752）：结论：除诊断与管理和内外科治疗类别外，GPT-4o 的准确性令人满意。所有类别的再现性普遍较高。借助 GPT-4o 的视听交流功能，在专业医务人员的指导下，该模型可用于为耳硬化症患者提供医疗信息和支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Archives of Oto-Rhino-Laryngology 医学-耳鼻喉科学

CiteScore

5.30

自引率

7.70%

发文量

537

审稿时长

2-4 weeks

期刊介绍： Official Journal of European Union of Medical Specialists – ORL Section and Board Official Journal of Confederation of European Oto-Rhino-Laryngology Head and Neck Surgery "European Archives of Oto-Rhino-Laryngology" publishes original clinical reports and clinically relevant experimental studies, as well as short communications presenting new results of special interest. With peer review by a respected international editorial board and prompt English-language publication, the journal provides rapid dissemination of information by authors from around the world. This particular feature makes it the journal of choice for readers who want to be informed about the continuing state of the art concerning basic sciences and the diagnosis and management of diseases of the head and neck on an international level. European Archives of Oto-Rhino-Laryngology was founded in 1864 as "Archiv für Ohrenheilkunde" by A. von Tröltsch, A. Politzer and H. Schwartze.