{"title":"Artificial Intelligence Chatbots and Narcolepsy: Friend or Foe for Patient Information?","authors":"Francisco Henriques, Christine Costa, Bárbara Oliveiros, Joana Barbosa Melo, Claúdia Santos, Joana Jesus-Ribeiro","doi":"10.1159/000547034","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Narcolepsy is a rare sleep disorder with a complex clinical picture, which may affect the daily functioning of patients. Artificial intelligence (AI) has emerged as a promising tool in healthcare, potentially offering valuable support to patients. However, its accuracy in specific medical domains remains inadequately assessed. This study aimed to evaluate and compare the accuracy, completeness, and readability of responses from ChatGPT, Gemini, and Perplexity to queries about narcolepsy.</p><p><strong>Methods: </strong>This study was a cross-sectional study. A set of 28 common patient questions was selected and entered into the three chatbots. Responses were independently reviewed by three sleep physicians. Accuracy and completeness were rated on predefined 5-point and 3-point scales, respectively. Readability was evaluated using six validated formulas.</p><p><strong>Results: </strong>All chatbots showed median accuracy ranging from \"more correct than incorrect\" to \"completely correct,\" with no significant performance differences. The topics with the lowest scores were \"treatment and prognosis\" for ChatGPT and Perplexity and \"diagnosis\" for Gemini. Gemini responses were significantly less complete compared to ChatGPT and Perplexity, with median completeness scores for ChatGPT and Perplexity ranging from \"nearly complete\" to \"complete\" and for Gemini ranging from \"incomplete\" to \"nearly complete.\" All chatbots' responses required an advanced reading level, with Perplexity showing lower readability in five metrics.</p><p><strong>Conclusion: </strong>Our findings highlight the potential of AI chatbots to deliver mostly accurate responses to narcolepsy-related queries. However, these tools have limitations, including text accessibility, as the readability of the responses did not align with the recommended standards for health information. Therefore, their use should be integrated with appropriate guidance from healthcare professionals to avoid potential misunderstandings.</p>","PeriodicalId":12065,"journal":{"name":"European Neurology","volume":" ","pages":"1-7"},"PeriodicalIF":2.4000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Neurology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1159/000547034","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Narcolepsy is a rare sleep disorder with a complex clinical picture, which may affect the daily functioning of patients. Artificial intelligence (AI) has emerged as a promising tool in healthcare, potentially offering valuable support to patients. However, its accuracy in specific medical domains remains inadequately assessed. This study aimed to evaluate and compare the accuracy, completeness, and readability of responses from ChatGPT, Gemini, and Perplexity to queries about narcolepsy.
Methods: This study was a cross-sectional study. A set of 28 common patient questions was selected and entered into the three chatbots. Responses were independently reviewed by three sleep physicians. Accuracy and completeness were rated on predefined 5-point and 3-point scales, respectively. Readability was evaluated using six validated formulas.
Results: All chatbots showed median accuracy ranging from "more correct than incorrect" to "completely correct," with no significant performance differences. The topics with the lowest scores were "treatment and prognosis" for ChatGPT and Perplexity and "diagnosis" for Gemini. Gemini responses were significantly less complete compared to ChatGPT and Perplexity, with median completeness scores for ChatGPT and Perplexity ranging from "nearly complete" to "complete" and for Gemini ranging from "incomplete" to "nearly complete." All chatbots' responses required an advanced reading level, with Perplexity showing lower readability in five metrics.
Conclusion: Our findings highlight the potential of AI chatbots to deliver mostly accurate responses to narcolepsy-related queries. However, these tools have limitations, including text accessibility, as the readability of the responses did not align with the recommended standards for health information. Therefore, their use should be integrated with appropriate guidance from healthcare professionals to avoid potential misunderstandings.
期刊介绍:
''European Neurology'' publishes original papers, reviews and letters to the editor. Papers presented in this journal cover clinical aspects of diseases of the nervous system and muscles, as well as their neuropathological, biochemical, and electrophysiological basis. New diagnostic probes, pharmacological and surgical treatments are evaluated from clinical evidence and basic investigative studies. The journal also features original works and reviews on the history of neurology.