Assessing ChatGPT Ability to Answer Frequently Asked Questions About Essential Tremor.

IF 2.1 Q2 CLINICAL NEUROLOGY

Tremor and Other Hyperkinetic Movements Pub Date : 2024-07-03 eCollection Date: 2024-01-01 DOI:10.5334/tohm.917

Cristiano Sorrentino, Vincenzo Canoro, Maria Russo, Caterina Giordano, Paolo Barone, Roberto Erro

{"title":"Assessing ChatGPT Ability to Answer Frequently Asked Questions About Essential Tremor.","authors":"Cristiano Sorrentino, Vincenzo Canoro, Maria Russo, Caterina Giordano, Paolo Barone, Roberto Erro","doi":"10.5334/tohm.917","DOIUrl":null,"url":null,"abstract":"Background: Large-language models (LLMs) driven by artificial intelligence allow people to engage in direct conversations about their health. The accuracy and readability of the answers provided by ChatGPT, the most famous LLM, about Essential Tremor (ET), one of the commonest movement disorders, have not yet been evaluated.Methods: Answers given by ChatGPT to 10 questions about ET were evaluated by 5 professionals and 15 laypeople with a score ranging from 1 (poor) to 5 (excellent) in terms of clarity, relevance, accuracy (only for professionals), comprehensiveness, and overall value of the response. We further calculated the readability of the answers.Results: ChatGPT answers received relatively positive evaluations, with median scores ranging between 4 and 5, by both groups and independently from the type of question. However, there was only moderate agreement between raters, especially in the group of professionals. Moreover, readability levels were poor for all examined answers.Discussion: ChatGPT provided relatively accurate and relevant answers, with some variability as judged by the group of professionals suggesting that the degree of literacy about ET has influenced the ratings and, indirectly, that the quality of information provided in clinical practice is also variable. Moreover, the readability of the answer provided by ChatGPT was found to be poor. LLMs will likely play a significant role in the future; therefore, health-related content generated by these tools should be monitored.","PeriodicalId":23317,"journal":{"name":"Tremor and Other Hyperkinetic Movements","volume":"14 ","pages":"33"},"PeriodicalIF":2.1000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11225576/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tremor and Other Hyperkinetic Movements","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5334/tohm.917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Large-language models (LLMs) driven by artificial intelligence allow people to engage in direct conversations about their health. The accuracy and readability of the answers provided by ChatGPT, the most famous LLM, about Essential Tremor (ET), one of the commonest movement disorders, have not yet been evaluated.

Methods: Answers given by ChatGPT to 10 questions about ET were evaluated by 5 professionals and 15 laypeople with a score ranging from 1 (poor) to 5 (excellent) in terms of clarity, relevance, accuracy (only for professionals), comprehensiveness, and overall value of the response. We further calculated the readability of the answers.

Results: ChatGPT answers received relatively positive evaluations, with median scores ranging between 4 and 5, by both groups and independently from the type of question. However, there was only moderate agreement between raters, especially in the group of professionals. Moreover, readability levels were poor for all examined answers.

Discussion: ChatGPT provided relatively accurate and relevant answers, with some variability as judged by the group of professionals suggesting that the degree of literacy about ET has influenced the ratings and, indirectly, that the quality of information provided in clinical practice is also variable. Moreover, the readability of the answer provided by ChatGPT was found to be poor. LLMs will likely play a significant role in the future; therefore, health-related content generated by these tools should be monitored.

Abstract Image

查看原文本刊更多论文

评估 ChatGPT 回答有关本质性震颤的常见问题的能力。

背景：由人工智能驱动的大型语言模型（LLM）可以让人们就自己的健康进行直接对话。最著名的大型语言模型 ChatGPT 提供的有关最常见运动障碍之一的本质性震颤（ET）的答案的准确性和可读性尚未得到评估：5 位专业人士和 15 位非专业人士对 ChatGPT 回答的 10 个有关 ET 的问题进行了评估，从清晰度、相关性、准确性（仅针对专业人士）、全面性和回答的整体价值等方面给出了 1 分（差）到 5 分（优）不等的分数。我们还进一步计算了答案的可读性：结果：聊天 GPT 答案获得了相对积极的评价，中位数在 4 分至 5 分之间，由两组人打分，与问题类型无关。然而，评分者之间的一致性不高，尤其是在专业人士组中。此外，所有受检答案的可读性都较差：讨论：ChatGPT 提供了相对准确和相关的答案，但专业人士组的判断存在一定的差异，这表明对 ET 的了解程度影响了评分，并间接表明临床实践中提供的信息质量也存在差异。此外，ChatGPT 提供的答案可读性较差。LLM 在未来可能会发挥重要作用；因此，应该对这些工具生成的健康相关内容进行监测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊