Artificial Intelligence Chatbots and Narcolepsy: Friend or Foe for Patient Information?

IF 2.4 4区医学 Q3 CLINICAL NEUROLOGY

European Neurology Pub Date : 2025-07-24 DOI:10.1159/000547034

Francisco Henriques, Christine Costa, Bárbara Oliveiros, Joana Barbosa Melo, Claúdia Santos, Joana Jesus-Ribeiro

{"title":"Artificial Intelligence Chatbots and Narcolepsy: Friend or Foe for Patient Information?","authors":"Francisco Henriques, Christine Costa, Bárbara Oliveiros, Joana Barbosa Melo, Claúdia Santos, Joana Jesus-Ribeiro","doi":"10.1159/000547034","DOIUrl":null,"url":null,"abstract":"Introduction: Narcolepsy is a rare sleep disorder with a complex clinical picture, which may affect the daily functioning of patients. Artificial intelligence (AI) has emerged as a promising tool in healthcare, potentially offering valuable support to patients. However, its accuracy in specific medical domains remains inadequately assessed. This study aimed to evaluate and compare the accuracy, completeness, and readability of responses from ChatGPT, Gemini, and Perplexity to queries about narcolepsy.Methods: This study was a cross-sectional study. A set of 28 common patient questions was selected and entered into the three chatbots. Responses were independently reviewed by three sleep physicians. Accuracy and completeness were rated on predefined 5-point and 3-point scales, respectively. Readability was evaluated using six validated formulas.Results: All chatbots showed median accuracy ranging from \"more correct than incorrect\" to \"completely correct,\" with no significant performance differences. The topics with the lowest scores were \"treatment and prognosis\" for ChatGPT and Perplexity and \"diagnosis\" for Gemini. Gemini responses were significantly less complete compared to ChatGPT and Perplexity, with median completeness scores for ChatGPT and Perplexity ranging from \"nearly complete\" to \"complete\" and for Gemini ranging from \"incomplete\" to \"nearly complete.\" All chatbots' responses required an advanced reading level, with Perplexity showing lower readability in five metrics.Conclusion: Our findings highlight the potential of AI chatbots to deliver mostly accurate responses to narcolepsy-related queries. However, these tools have limitations, including text accessibility, as the readability of the responses did not align with the recommended standards for health information. Therefore, their use should be integrated with appropriate guidance from healthcare professionals to avoid potential misunderstandings.","PeriodicalId":12065,"journal":{"name":"European Neurology","volume":" ","pages":"1-7"},"PeriodicalIF":2.4000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Neurology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1159/000547034","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Narcolepsy is a rare sleep disorder with a complex clinical picture, which may affect the daily functioning of patients. Artificial intelligence (AI) has emerged as a promising tool in healthcare, potentially offering valuable support to patients. However, its accuracy in specific medical domains remains inadequately assessed. This study aimed to evaluate and compare the accuracy, completeness, and readability of responses from ChatGPT, Gemini, and Perplexity to queries about narcolepsy.

Methods: This study was a cross-sectional study. A set of 28 common patient questions was selected and entered into the three chatbots. Responses were independently reviewed by three sleep physicians. Accuracy and completeness were rated on predefined 5-point and 3-point scales, respectively. Readability was evaluated using six validated formulas.

Results: All chatbots showed median accuracy ranging from "more correct than incorrect" to "completely correct," with no significant performance differences. The topics with the lowest scores were "treatment and prognosis" for ChatGPT and Perplexity and "diagnosis" for Gemini. Gemini responses were significantly less complete compared to ChatGPT and Perplexity, with median completeness scores for ChatGPT and Perplexity ranging from "nearly complete" to "complete" and for Gemini ranging from "incomplete" to "nearly complete." All chatbots' responses required an advanced reading level, with Perplexity showing lower readability in five metrics.

Conclusion: Our findings highlight the potential of AI chatbots to deliver mostly accurate responses to narcolepsy-related queries. However, these tools have limitations, including text accessibility, as the readability of the responses did not align with the recommended standards for health information. Therefore, their use should be integrated with appropriate guidance from healthcare professionals to avoid potential misunderstandings.

查看原文本刊更多论文

人工智能聊天机器人与嗜睡症：患者信息的是敌是友？

嗜睡症是一种罕见的睡眠障碍，临床表现复杂，可能影响患者的日常功能。人工智能已经成为医疗保健领域一个很有前途的工具，可能为患者提供有价值的支持。然而，其在特定医学领域的准确性仍未得到充分评估。本研究旨在评估和比较ChatGPT、Gemini和Perplexity对发作性睡病询问的回答的准确性、完整性和可读性。方法：横断面研究。选择了28个常见的患者问题并输入到三个聊天机器人中。回答由三位睡眠医生独立审查。准确性和完整性分别按照预先设定的5分制和3分制进行评定。使用六个经过验证的公式评估可读性。结果：所有聊天机器人的准确率中值范围从“比错误更正确”到“完全正确”，没有明显的性能差异。得分最低的话题是ChatGPT和Perplexity的“治疗和预后”，以及Gemini的“诊断”。与ChatGPT和Perplexity相比，双子座的回答明显不完整，ChatGPT和Perplexity的中位完整性评分从“几乎完成”到“完成”，双子座的评分从“不完整”到“几乎完成”。所有聊天机器人的回答都需要高级阅读水平，而Perplexity在五个指标上的可读性较低。结论：我们的研究结果突出了人工智能聊天机器人在回答嗜睡症相关问题方面的潜力。然而，这些工具有局限性，包括文本的可访问性，因为答复的可读性不符合卫生信息的建议标准。因此，它们的使用应与医疗保健专业人员的适当指导相结合，以避免潜在的误解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Neurology 医学-临床神经学

CiteScore

4.40

自引率

4.20%

发文量

审稿时长

4-8 weeks

期刊介绍： ''European Neurology'' publishes original papers, reviews and letters to the editor. Papers presented in this journal cover clinical aspects of diseases of the nervous system and muscles, as well as their neuropathological, biochemical, and electrophysiological basis. New diagnostic probes, pharmacological and surgical treatments are evaluated from clinical evidence and basic investigative studies. The journal also features original works and reviews on the history of neurology.