Evaluation of artificial ıntelligence use in ankylosing spondylitis with ChatGPT-4: patient and physician perspectives.

IF 2.8 3区医学 Q2 RHEUMATOLOGY

Clinical Rheumatology Pub Date : 2025-09-11 DOI:10.1007/s10067-025-07648-w

Elif Altunel Kılınç, Neşe Çabuk Çelik

{"title":"Evaluation of artificial ıntelligence use in ankylosing spondylitis with ChatGPT-4: patient and physician perspectives.","authors":"Elif Altunel Kılınç, Neşe Çabuk Çelik","doi":"10.1007/s10067-025-07648-w","DOIUrl":null,"url":null,"abstract":"Introduction: This study aims to evaluate the accuracy and comprehensiveness of the information provided by ChatGPT-4, an artificial intelligence-based system, regarding ankylosing spondylitis (AS) from the perspectives of patients and physicians.Method: In this cross-sectional study, 75 questions were asked of ChatGPT-4. These were the most frequently asked questions about AS on Google Trends (group 1), and questions derived from ASAS/EULAR recommendations (group 2 and group 3). Group 2 consisted of open-ended questions, and group 3 consisted of case questions. Two expert rheumatologists scored the responses for accuracy and comprehensiveness. A six-point Likert scale was used to assess accuracy, and a three-point scale for completeness.Results: The accuracy and completeness scores analyzed in this study were found to be 5.32 ± 1.4 and 2.76 ± 0.5 for group 1, 5.36 ± 1.1 and 2.72 ± 0.45 for group 2, and 4.24 ± 1.96 and 2.36 ± 0.63 for group 3, respectively. There was a significant difference in accuracy and completeness scores between the groups (p = 0.044 and p = 0.019). Cohen's kappa coefficient showed excellent agreement with values of 0.88 for accuracy and 0.90 for completeness.Conclusion: While the responses to questions in groups 1 and 2 were satisfactory in terms of accuracy and comprehensiveness, the responses to complex case questions in group 3 were not sufficient. ChatGPT-4 appears to be a useful resource for patient education, but response mechanisms for complex clinical scenarios need to be improved. Furthermore, the potential for generating false or fabricated data should be considered; therefore, physicians should evaluate and verify responses. Key Points • ChatGPT-4 demonstrated high accuracy and comprehensiveness in addressing the information needs of patients and physicians regarding axial spondyloarthritis (AS), with satisfactory responses particularly for frequently asked questions and questions derived from ASAS/EULAR recommendations. • The quality of responses significantly decreased in the group of' case questions' involving more complex scenarios, indicating that the AI may be inadequate in clinical decision-making processes. • ChatGPT-4 can facilitate access to health information for patients, potentially reducing unnecessary clinic visits and serving as a primary source of information for those with limited access to healthcare professionals, thereby enhancing the overall effectiveness of patient care. • While ChatGPT-4 shows promise as a valuable tool for patient education, there is a clear need for the development of its response mechanisms, particularly for complex clinical scenarios.","PeriodicalId":10482,"journal":{"name":"Clinical Rheumatology","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Rheumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10067-025-07648-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: This study aims to evaluate the accuracy and comprehensiveness of the information provided by ChatGPT-4, an artificial intelligence-based system, regarding ankylosing spondylitis (AS) from the perspectives of patients and physicians.

Method: In this cross-sectional study, 75 questions were asked of ChatGPT-4. These were the most frequently asked questions about AS on Google Trends (group 1), and questions derived from ASAS/EULAR recommendations (group 2 and group 3). Group 2 consisted of open-ended questions, and group 3 consisted of case questions. Two expert rheumatologists scored the responses for accuracy and comprehensiveness. A six-point Likert scale was used to assess accuracy, and a three-point scale for completeness.

Results: The accuracy and completeness scores analyzed in this study were found to be 5.32 ± 1.4 and 2.76 ± 0.5 for group 1, 5.36 ± 1.1 and 2.72 ± 0.45 for group 2, and 4.24 ± 1.96 and 2.36 ± 0.63 for group 3, respectively. There was a significant difference in accuracy and completeness scores between the groups (p = 0.044 and p = 0.019). Cohen's kappa coefficient showed excellent agreement with values of 0.88 for accuracy and 0.90 for completeness.

Conclusion: While the responses to questions in groups 1 and 2 were satisfactory in terms of accuracy and comprehensiveness, the responses to complex case questions in group 3 were not sufficient. ChatGPT-4 appears to be a useful resource for patient education, but response mechanisms for complex clinical scenarios need to be improved. Furthermore, the potential for generating false or fabricated data should be considered; therefore, physicians should evaluate and verify responses. Key Points • ChatGPT-4 demonstrated high accuracy and comprehensiveness in addressing the information needs of patients and physicians regarding axial spondyloarthritis (AS), with satisfactory responses particularly for frequently asked questions and questions derived from ASAS/EULAR recommendations. • The quality of responses significantly decreased in the group of' case questions' involving more complex scenarios, indicating that the AI may be inadequate in clinical decision-making processes. • ChatGPT-4 can facilitate access to health information for patients, potentially reducing unnecessary clinic visits and serving as a primary source of information for those with limited access to healthcare professionals, thereby enhancing the overall effectiveness of patient care. • While ChatGPT-4 shows promise as a valuable tool for patient education, there is a clear need for the development of its response mechanisms, particularly for complex clinical scenarios.

查看原文本刊更多论文

通过ChatGPT-4评估人工ıntelligence在强直性脊柱炎中的应用：患者和医生的观点。

前言：本研究旨在从患者和医生的角度评估基于人工智能的ChatGPT-4系统提供的强直性脊柱炎（AS）信息的准确性和全面性。方法：在本横断面研究中，对ChatGPT-4提出75个问题。这些是谷歌Trends上关于AS的最常见问题（第一组），以及来自ASAS/EULAR建议的问题（第二组和第三组）。第二组为开放式问题，第三组为个案问题。两位风湿病专家对回答的准确性和全面性进行了评分。6分李克特量表用于评估准确性，3分量表用于完整性。结果：本研究分析的准确性和完整性评分，1组为5.32±1.4和2.76±0.5,2组为5.36±1.1和2.72±0.45,3组为4.24±1.96和2.36±0.63。两组的准确性和完整性评分差异有统计学意义（p = 0.044和p = 0.019）。科恩的kappa系数显示出良好的一致性，精度为0.88，完整性为0.90。结论：第1组和第2组对问题的回答在准确性和全面性方面令人满意，而第3组对复杂病例问题的回答不够充分。ChatGPT-4似乎是患者教育的有用资源，但复杂临床情况的反应机制需要改进。此外，应考虑产生虚假或伪造数据的可能性；因此，医生应该评估和验证反应。•ChatGPT-4在解决患者和医生关于轴性脊柱炎（AS）的信息需求方面表现出高度的准确性和全全性，特别是对于常见问题和来自ASAS/EULAR建议的问题，得到了令人满意的回答。•在涉及更复杂场景的“病例问题”组中，回答的质量显著下降，表明人工智能在临床决策过程中可能不足。•ChatGPT-4可以促进患者获取健康信息，可能减少不必要的诊所访问，并为那些无法获得医疗保健专业人员服务的人提供主要信息来源，从而提高患者护理的总体有效性。•虽然ChatGPT-4有望成为一种有价值的患者教育工具，但显然需要开发其反应机制，特别是对于复杂的临床场景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical Rheumatology 医学-风湿病学

CiteScore

6.90

自引率

2.90%

发文量

441

审稿时长

3 months

期刊介绍： Clinical Rheumatology is an international English-language journal devoted to publishing original clinical investigation and research in the general field of rheumatology with accent on clinical aspects at postgraduate level. The journal succeeds Acta Rheumatologica Belgica, originally founded in 1945 as the official journal of the Belgian Rheumatology Society. Clinical Rheumatology aims to cover all modern trends in clinical and experimental research as well as the management and evaluation of diagnostic and treatment procedures connected with the inflammatory, immunologic, metabolic, genetic and degenerative soft and hard connective tissue diseases.