{"title":"Assessing the accuracy and reproducibility of ChatGPT for responding to patient inquiries about otosclerosis.","authors":"Utku Mete, Ömer Afşın Özmen","doi":"10.1007/s00405-024-09039-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Patients increasingly use chatbots powered by artificial intelligence to seek information. However, there is a lack of reliable studies on the accuracy and reproducibility of the information provided by these models. Therefore, we conducted a study investigating the ChatGPT's responses to questions about otosclerosis.</p><p><strong>Methods: </strong>96 otosclerosis-related questions were collected from internet searches and websites of professional institutions and societies. Questions are divided into four sub-categories. These questions were directed at the latest version of ChatGPT Plus, and these responses were assessed by two otorhinolaryngology surgeons. Accuracy was graded as correct, incomplete, mixed, and irrelevant. Reproducibility was evaluated by comparing the consistency of the two answers to each specific question.</p><p><strong>Results: </strong>The overall accuracy and reproducibility rates of GPT-4o for correct answers were found to be 64.60% and 89.60%, respectively. The findings showed correct answers for accuracy and reproducibility for basic knowledge were 64.70% and 91.20%; for diagnosis & management, 64.0% and 92.0%; for medical & surgical treatment, 52.95% and 82.35%; and for operative risks & postoperative period, 75.0% and 90.0%, respectively. There were no significant differences found between the answers and groups in terms of accuracy and reproducibility (p = 0.073 and p = 0.752, respectively).</p><p><strong>Conclusion: </strong>GPT-4o achieved satisfactory accuracy results, except in the diagnosis & management and medical & surgical treatment categories. Reproducibility was generally high across all categories. With the audio and visual communication capabilities of GPT-4o, under the supervision of a medical professional, this model can be utilized to provide medical information and support for otosclerosis patients.</p>","PeriodicalId":11952,"journal":{"name":"European Archives of Oto-Rhino-Laryngology","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Archives of Oto-Rhino-Laryngology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00405-024-09039-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Patients increasingly use chatbots powered by artificial intelligence to seek information. However, there is a lack of reliable studies on the accuracy and reproducibility of the information provided by these models. Therefore, we conducted a study investigating the ChatGPT's responses to questions about otosclerosis.
Methods: 96 otosclerosis-related questions were collected from internet searches and websites of professional institutions and societies. Questions are divided into four sub-categories. These questions were directed at the latest version of ChatGPT Plus, and these responses were assessed by two otorhinolaryngology surgeons. Accuracy was graded as correct, incomplete, mixed, and irrelevant. Reproducibility was evaluated by comparing the consistency of the two answers to each specific question.
Results: The overall accuracy and reproducibility rates of GPT-4o for correct answers were found to be 64.60% and 89.60%, respectively. The findings showed correct answers for accuracy and reproducibility for basic knowledge were 64.70% and 91.20%; for diagnosis & management, 64.0% and 92.0%; for medical & surgical treatment, 52.95% and 82.35%; and for operative risks & postoperative period, 75.0% and 90.0%, respectively. There were no significant differences found between the answers and groups in terms of accuracy and reproducibility (p = 0.073 and p = 0.752, respectively).
Conclusion: GPT-4o achieved satisfactory accuracy results, except in the diagnosis & management and medical & surgical treatment categories. Reproducibility was generally high across all categories. With the audio and visual communication capabilities of GPT-4o, under the supervision of a medical professional, this model can be utilized to provide medical information and support for otosclerosis patients.
期刊介绍:
Official Journal of
European Union of Medical Specialists – ORL Section and Board
Official Journal of Confederation of European Oto-Rhino-Laryngology Head and Neck Surgery
"European Archives of Oto-Rhino-Laryngology" publishes original clinical reports and clinically relevant experimental studies, as well as short communications presenting new results of special interest. With peer review by a respected international editorial board and prompt English-language publication, the journal provides rapid dissemination of information by authors from around the world. This particular feature makes it the journal of choice for readers who want to be informed about the continuing state of the art concerning basic sciences and the diagnosis and management of diseases of the head and neck on an international level.
European Archives of Oto-Rhino-Laryngology was founded in 1864 as "Archiv für Ohrenheilkunde" by A. von Tröltsch, A. Politzer and H. Schwartze.