Assessing the Quality of ChatGPT Responses to Dementia Caregivers' Questions: Qualitative Analysis.

IF 5 Q1 GERIATRICS & GERONTOLOGY

JMIR Aging Pub Date : 2024-05-06 DOI:10.2196/53019

Alyssa Aguirre, Robin Hilsabeck, Tawny Smith, Bo Xie, Daqing He, Zhendong Wang, Ning Zou

{"title":"Assessing the Quality of ChatGPT Responses to Dementia Caregivers' Questions: Qualitative Analysis.","authors":"Alyssa Aguirre, Robin Hilsabeck, Tawny Smith, Bo Xie, Daqing He, Zhendong Wang, Ning Zou","doi":"10.2196/53019","DOIUrl":null,"url":null,"abstract":"Background: Artificial intelligence (AI) such as ChatGPT by OpenAI holds great promise to improve the quality of life of patients with dementia and their caregivers by providing high-quality responses to their questions about typical dementia behaviors. So far, however, evidence on the quality of such ChatGPT responses is limited. A few recent publications have investigated the quality of ChatGPT responses in other health conditions. Our study is the first to assess ChatGPT using real-world questions asked by dementia caregivers themselves.Objectives: This pilot study examines the potential of ChatGPT-3.5 to provide high-quality information that may enhance dementia care and patient-caregiver education.Methods: Our interprofessional team used a formal rating scale (scoring range: 0-5; the higher the score, the better the quality) to evaluate ChatGPT responses to real-world questions posed by dementia caregivers. We selected 60 posts by dementia caregivers from Reddit, a popular social media platform. These posts were verified by 3 interdisciplinary dementia clinicians as representing dementia caregivers' desire for information in the areas of memory loss and confusion, aggression, and driving. Word count for posts in the memory loss and confusion category ranged from 71 to 531 (mean 218; median 188), aggression posts ranged from 58 to 602 words (mean 254; median 200), and driving posts ranged from 93 to 550 words (mean 272; median 276).Results: ChatGPT's response quality scores ranged from 3 to 5. Of the 60 responses, 26 (43%) received 5 points, 21 (35%) received 4 points, and 13 (22%) received 3 points, suggesting high quality. ChatGPT obtained consistently high scores in synthesizing information to provide follow-up recommendations (n=58, 96%), with the lowest scores in the area of comprehensiveness (n=38, 63%).Conclusions: ChatGPT provided high-quality responses to complex questions posted by dementia caregivers, but it did have limitations. ChatGPT was unable to anticipate future problems that a human professional might recognize and address in a clinical encounter. At other times, ChatGPT recommended a strategy that the caregiver had already explicitly tried. This pilot study indicates the potential of AI to provide high-quality information to enhance dementia care and patient-caregiver education in tandem with information provided by licensed health care professionals. Evaluating the quality of responses is necessary to ensure that caregivers can make informed decisions. ChatGPT has the potential to transform health care practice by shaping how caregivers receive health information.","PeriodicalId":36245,"journal":{"name":"JMIR Aging","volume":"7 ","pages":"e53019"},"PeriodicalIF":5.0000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11089887/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Aging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/53019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GERIATRICS & GERONTOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Artificial intelligence (AI) such as ChatGPT by OpenAI holds great promise to improve the quality of life of patients with dementia and their caregivers by providing high-quality responses to their questions about typical dementia behaviors. So far, however, evidence on the quality of such ChatGPT responses is limited. A few recent publications have investigated the quality of ChatGPT responses in other health conditions. Our study is the first to assess ChatGPT using real-world questions asked by dementia caregivers themselves.

Objectives: This pilot study examines the potential of ChatGPT-3.5 to provide high-quality information that may enhance dementia care and patient-caregiver education.

Methods: Our interprofessional team used a formal rating scale (scoring range: 0-5; the higher the score, the better the quality) to evaluate ChatGPT responses to real-world questions posed by dementia caregivers. We selected 60 posts by dementia caregivers from Reddit, a popular social media platform. These posts were verified by 3 interdisciplinary dementia clinicians as representing dementia caregivers' desire for information in the areas of memory loss and confusion, aggression, and driving. Word count for posts in the memory loss and confusion category ranged from 71 to 531 (mean 218; median 188), aggression posts ranged from 58 to 602 words (mean 254; median 200), and driving posts ranged from 93 to 550 words (mean 272; median 276).

Results: ChatGPT's response quality scores ranged from 3 to 5. Of the 60 responses, 26 (43%) received 5 points, 21 (35%) received 4 points, and 13 (22%) received 3 points, suggesting high quality. ChatGPT obtained consistently high scores in synthesizing information to provide follow-up recommendations (n=58, 96%), with the lowest scores in the area of comprehensiveness (n=38, 63%).

Conclusions: ChatGPT provided high-quality responses to complex questions posted by dementia caregivers, but it did have limitations. ChatGPT was unable to anticipate future problems that a human professional might recognize and address in a clinical encounter. At other times, ChatGPT recommended a strategy that the caregiver had already explicitly tried. This pilot study indicates the potential of AI to provide high-quality information to enhance dementia care and patient-caregiver education in tandem with information provided by licensed health care professionals. Evaluating the quality of responses is necessary to ensure that caregivers can make informed decisions. ChatGPT has the potential to transform health care practice by shaping how caregivers receive health information.

查看原文本刊更多论文

评估聊天GPT 对痴呆症照护者问题的回答质量：定性分析。

背景：人工智能（AI），如 OpenAI 的 ChatGPT，通过高质量地回答痴呆症患者及其护理人员提出的有关痴呆症典型行为的问题，有望提高痴呆症患者及其护理人员的生活质量。然而，到目前为止，有关此类 ChatGPT 回答质量的证据还很有限。最近有几篇文章对其他健康状况下的 ChatGPT 回答质量进行了调查。我们的研究是首次使用痴呆症照护者自己提出的真实世界问题来评估 ChatGPT：本试验性研究探讨了 ChatGPT-3.5 提供高质量信息的潜力，这些信息可加强痴呆症护理和患者-护理人员教育：我们的跨专业团队使用正式的评分表（评分范围：0-5；分数越高，质量越好）来评估 ChatGPT 对痴呆症护理人员提出的实际问题的回复。我们从 Reddit（一个流行的社交媒体平台）上选取了 60 篇痴呆症护理人员发表的帖子。这些帖子经过 3 位跨学科痴呆临床医生的验证，代表了痴呆症护理人员在记忆力减退和混乱、攻击性和驾驶方面的信息需求。记忆力减退和混乱类帖子的字数从 71 到 531（平均 218；中位数 188）不等，攻击类帖子的字数从 58 到 602（平均 254；中位数 200）不等，驾驶类帖子的字数从 93 到 550（平均 272；中位数 276）不等：ChatGPT 的回复质量得分在 3 到 5 分之间。在 60 个回复中，26 个（43%）获得 5 分，21 个（35%）获得 4 分，13 个（22%）获得 3 分，这表明回复质量很高。ChatGPT 在综合信息以提供后续建议方面一直获得高分（n=58，96%），而在全面性方面得分最低（n=38，63%）：结论：ChatGPT 为痴呆症照护者提出的复杂问题提供了高质量的回复，但也存在局限性。ChatGPT 无法预测未来的问题，而人工专业人员可能会在临床会诊中发现并解决这些问题。有时，ChatGPT 会推荐护理者已经明确尝试过的策略。这项试点研究表明，人工智能具有提供高质量信息的潜力，可以与持证医疗保健专业人员提供的信息一起加强痴呆症护理和患者-护理人员教育。有必要对回答的质量进行评估，以确保护理人员能够做出明智的决定。通过改变护理人员接收健康信息的方式，ChatGPT 有可能改变医疗保健实践。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊