An artificial intelligence perspective on geriatric syndromes: assessing the information accuracy and readability of ChatGPT.

IF 3.5 3区医学 Q2 GERIATRICS & GERONTOLOGY

European Geriatric Medicine Pub Date : 2025-04-21 DOI:10.1007/s41999-025-01202-2

Eyyup Murat Efendioglu, Ahmet Cigiloglu

{"title":"An artificial intelligence perspective on geriatric syndromes: assessing the information accuracy and readability of ChatGPT.","authors":"Eyyup Murat Efendioglu, Ahmet Cigiloglu","doi":"10.1007/s41999-025-01202-2","DOIUrl":null,"url":null,"abstract":"Purpose: ChatGPT, a comprehensive language processing model, provides the opportunity for supportive and professional interactions with patients. However, its use to address patients' frequently asked questions (FAQs) and the readability of the text generated by ChatGPT remain unexplored, particularly in geriatrics. We identified the FAQs about common geriatric syndromes and assessed the accuracy and readability of the responses provided by ChatGPT.Methods: Two geriatricians with extensive knowledge and experience in geriatric syndromes independently reviewed the 28 responses provided by ChatGPT. The accuracy of the responses generated by ChatGPT was categorized on a rating scale from 0 (harmful) to 4 (excellent) based on current guidelines and approaches. The readability of the text generated by ChatGPT was assessed by administering two tests: the Flesch-Kincaid Reading Ease (FKRE) and the Flesch-Kincaid Grade Level (FKGL).Results: ChatGPT-generated responses with an overall mean accuracy score of 88% (3.52/4). Responses generated for sarcopenia diagnosis and depression treatment in older adults had the lowest accuracy scores (2.0 and 2.5, respectively). The mean FKRE score of the texts was 25.2, while the mean FKGL score was 14.5.Conclusion: The accuracy scores of the responses generated by ChatGPT were high in most common geriatric syndromes except for sarcopenia diagnosis and depression treatment. Moreover, the text generated by ChatGPT was very difficult to read and best understood by college graduates. ChatGPT may reduce the uncertainty many patients face. Nevertheless, it remains advisable to consult with subject matter experts when undertaking consequential decision-making.","PeriodicalId":49287,"journal":{"name":"European Geriatric Medicine","volume":" ","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Geriatric Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s41999-025-01202-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GERIATRICS & GERONTOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: ChatGPT, a comprehensive language processing model, provides the opportunity for supportive and professional interactions with patients. However, its use to address patients' frequently asked questions (FAQs) and the readability of the text generated by ChatGPT remain unexplored, particularly in geriatrics. We identified the FAQs about common geriatric syndromes and assessed the accuracy and readability of the responses provided by ChatGPT.

Methods: Two geriatricians with extensive knowledge and experience in geriatric syndromes independently reviewed the 28 responses provided by ChatGPT. The accuracy of the responses generated by ChatGPT was categorized on a rating scale from 0 (harmful) to 4 (excellent) based on current guidelines and approaches. The readability of the text generated by ChatGPT was assessed by administering two tests: the Flesch-Kincaid Reading Ease (FKRE) and the Flesch-Kincaid Grade Level (FKGL).

Results: ChatGPT-generated responses with an overall mean accuracy score of 88% (3.52/4). Responses generated for sarcopenia diagnosis and depression treatment in older adults had the lowest accuracy scores (2.0 and 2.5, respectively). The mean FKRE score of the texts was 25.2, while the mean FKGL score was 14.5.

Conclusion: The accuracy scores of the responses generated by ChatGPT were high in most common geriatric syndromes except for sarcopenia diagnosis and depression treatment. Moreover, the text generated by ChatGPT was very difficult to read and best understood by college graduates. ChatGPT may reduce the uncertainty many patients face. Nevertheless, it remains advisable to consult with subject matter experts when undertaking consequential decision-making.

查看原文本刊更多论文

老年综合征的人工智能视角：评估ChatGPT的信息准确性和可读性。

目的：ChatGPT是一种综合性的语言处理模型，为与患者进行支持性和专业性的互动提供了机会。然而，它在解决患者常见问题（FAQs）和ChatGPT生成的文本的可读性方面的应用仍未得到探索，特别是在老年病学中。我们确定了常见老年综合征的常见问题，并评估了ChatGPT提供的回答的准确性和可读性。方法：两位在老年综合征方面具有丰富知识和经验的老年病专家独立审查了ChatGPT提供的28份应答。根据目前的指导方针和方法，ChatGPT产生的回答的准确性被分类为从0（有害）到4（优秀）的等级。ChatGPT生成的文本的可读性通过两项测试进行评估：Flesch-Kincaid Reading Ease （FKRE）和Flesch-Kincaid Grade Level （FKGL）。结果：chatgpt生成的应答总体平均准确率评分为88%（3.52/4）。老年人肌肉减少症诊断和抑郁症治疗的反应准确率最低（分别为2.0和2.5）。文本的FKRE平均分为25.2分，FKGL平均分为14.5分。结论：ChatGPT对除肌少症诊断和抑郁症治疗外的大多数常见老年综合征的应答准确率均较高。此外，ChatGPT生成的文本很难阅读，只有大学毕业生才能理解。ChatGPT可以减少许多患者面临的不确定性。尽管如此，在进行重大决策时，还是建议咨询主题专家。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Geriatric Medicine GERIATRICS & GERONTOLOGY-

CiteScore

6.70

自引率

2.60%

发文量

114

审稿时长

6-12 weeks

期刊介绍： European Geriatric Medicine is the official journal of the European Geriatric Medicine Society (EUGMS). Launched in 2010, this journal aims to publish the highest quality material, both scientific and clinical, on all aspects of Geriatric Medicine. The EUGMS is interested in the promotion of Geriatric Medicine in any setting (acute or subacute care, rehabilitation, nursing homes, primary care, fall clinics, ambulatory assessment, dementia clinics..), and also in functionality in old age, comprehensive geriatric assessment, geriatric syndromes, geriatric education, old age psychiatry, models of geriatric care in health services, and quality assurance.