Assessment of ChatGPT's adherence to EULAR diagnostic criteria and therapeutic protocols for rheumatoid arthritis at two distinct time points, 14 days apart, utilizing binary and multiple-choice inquiries.

IF 2.9 3区 医学 Q2 RHEUMATOLOGY
Neşe Çabuk Çelik, Elif Altunel Kılınç
{"title":"Assessment of ChatGPT's adherence to EULAR diagnostic criteria and therapeutic protocols for rheumatoid arthritis at two distinct time points, 14 days apart, utilizing binary and multiple-choice inquiries.","authors":"Neşe Çabuk Çelik, Elif Altunel Kılınç","doi":"10.1007/s10067-025-07417-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Artificial intelligence (AI) possesses considerable promise in healthcare to offer decision help in particular domains, including rheumatoid arthritis (RA). This study assesses the adherence of the advanced AI model ChatGPT-v4 to the European League Against Rheumatism (EULAR) recommendations.</p><p><strong>Methods: </strong>The research employed a 100-item questionnaire consisting of true/false and multiple-choice formats, accompanied with real-world clinical scenarios developed concurrently with EULAR in the therapy of RA. Inquiries addressed diagnostic criteria, therapeutic alternatives, and follow-up procedures. Two rheumatologists assessed the ChatGPT for accuracy, consistency, and comprehensiveness utilizing a 6-point Likert scale.</p><p><strong>Results: </strong>Evaluation occurred at baseline and on day 14. AI rectified the majority of errors at baseline in the paired questions. It did not advance on specific responses. One of the two previously incongruent responses remained unaltered, while the other was rectified. The 48 originally congruent responses rose to 49 on day 14. In binary questions, AI exhibited greater coherence than in multiple-choice questions. At baseline, 43 (86%) of the multiple-choice items were answered correctly. Upon reevaluation, 42 (84%) were found to be accurate. One response was erroneous on day 14. Three of the seven initially erroneous responses remained unaltered. Four erroneous responses were later rectified.</p><p><strong>Conclusion: </strong>ChatGPT demonstrated efficacy in addressing binary and multiple-choice questions formulated according to EULAR guidelines for RA. The findings validated that AI can serve as a clinical support instrument in RA. It demonstrated that AI can be enhanced. AI attained accuracy in objective information and promptly rectified the error. Key Points • AI in healthcare: The integration of artificial intelligence, specifically ChatGPT-v4, in clinical practice aims to enhance decision-making in RA by adhering to EULAR recommendations for diagnosis, treatment, and follow-up. • Inter-rater reliability: High agreement levels were noted among the evaluators, with Cohen's kappa coefficients of 0.92 for binary questions and 0.94 for multiple-choice questions. • AI learning dynamics: The study reveals that ChatGPT showed improvement in understanding and answering more complex questions over time, unlike findings in previous studies where AI struggled with consistency. • Implications for clinical practice: The findings support the growing role of AI as a reliable tool in rheumatology, suggesting potential for personalized, evidence-based patient care.</p>","PeriodicalId":10482,"journal":{"name":"Clinical Rheumatology","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Rheumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10067-025-07417-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: Artificial intelligence (AI) possesses considerable promise in healthcare to offer decision help in particular domains, including rheumatoid arthritis (RA). This study assesses the adherence of the advanced AI model ChatGPT-v4 to the European League Against Rheumatism (EULAR) recommendations.

Methods: The research employed a 100-item questionnaire consisting of true/false and multiple-choice formats, accompanied with real-world clinical scenarios developed concurrently with EULAR in the therapy of RA. Inquiries addressed diagnostic criteria, therapeutic alternatives, and follow-up procedures. Two rheumatologists assessed the ChatGPT for accuracy, consistency, and comprehensiveness utilizing a 6-point Likert scale.

Results: Evaluation occurred at baseline and on day 14. AI rectified the majority of errors at baseline in the paired questions. It did not advance on specific responses. One of the two previously incongruent responses remained unaltered, while the other was rectified. The 48 originally congruent responses rose to 49 on day 14. In binary questions, AI exhibited greater coherence than in multiple-choice questions. At baseline, 43 (86%) of the multiple-choice items were answered correctly. Upon reevaluation, 42 (84%) were found to be accurate. One response was erroneous on day 14. Three of the seven initially erroneous responses remained unaltered. Four erroneous responses were later rectified.

Conclusion: ChatGPT demonstrated efficacy in addressing binary and multiple-choice questions formulated according to EULAR guidelines for RA. The findings validated that AI can serve as a clinical support instrument in RA. It demonstrated that AI can be enhanced. AI attained accuracy in objective information and promptly rectified the error. Key Points • AI in healthcare: The integration of artificial intelligence, specifically ChatGPT-v4, in clinical practice aims to enhance decision-making in RA by adhering to EULAR recommendations for diagnosis, treatment, and follow-up. • Inter-rater reliability: High agreement levels were noted among the evaluators, with Cohen's kappa coefficients of 0.92 for binary questions and 0.94 for multiple-choice questions. • AI learning dynamics: The study reveals that ChatGPT showed improvement in understanding and answering more complex questions over time, unlike findings in previous studies where AI struggled with consistency. • Implications for clinical practice: The findings support the growing role of AI as a reliable tool in rheumatology, suggesting potential for personalized, evidence-based patient care.

评估ChatGPT在两个不同的时间点(间隔14天)对类风湿关节炎的EULAR诊断标准和治疗方案的依从性,使用二元和多项选择询问。
目的:人工智能(AI)在医疗保健领域具有相当大的前景,可以在特定领域提供决策帮助,包括类风湿性关节炎(RA)。本研究评估了先进的人工智能模型ChatGPT-v4对欧洲抗风湿病联盟(EULAR)建议的依从性。方法:本研究采用了一份100项的问卷,包括真假和多项选择格式,并附有与EULAR治疗RA同时发生的真实临床场景。询问涉及诊断标准、治疗方案和随访程序。两名风湿病学家利用6分李克特量表评估ChatGPT的准确性、一致性和全面性。结果:在基线和第14天进行评估。人工智能在配对问题的基线上纠正了大部分错误。它没有提出具体的回应。先前的两个不一致的反应之一保持不变,而另一个则被纠正。在第14天,最初一致的48个反应增加到49个。在二元问题中,人工智能表现出比多项选择题更强的连贯性。在基线时,43个(86%)选择题答对了。重新评估后,发现42例(84%)是准确的。第14天的一个回答是错误的。最初的七个错误回答中有三个没有改变。四个错误的回答后来得到了纠正。结论:ChatGPT在解决根据EULAR RA指南制定的二元和多项选择题方面表现出有效性。研究结果验证了人工智能可以作为RA的临床支持工具。它证明了人工智能是可以增强的。人工智能在客观信息上达到了准确性,并及时纠正了错误。•医疗领域的人工智能:人工智能,特别是ChatGPT-v4,在临床实践中的整合旨在通过坚持EULAR对诊断、治疗和随访的建议来增强RA的决策。•评估者之间的信度:评估者之间的一致性水平很高,二元问题的科恩kappa系数为0.92,选择题的科恩kappa系数为0.94。•人工智能学习动态:该研究显示,随着时间的推移,ChatGPT在理解和回答更复杂的问题方面表现出了进步,这与之前人工智能在一致性方面遇到困难的研究结果不同。•对临床实践的影响:研究结果支持人工智能作为风湿病学可靠工具的作用日益增强,表明个性化、循证患者护理的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Clinical Rheumatology
Clinical Rheumatology 医学-风湿病学
CiteScore
6.90
自引率
2.90%
发文量
441
审稿时长
3 months
期刊介绍: Clinical Rheumatology is an international English-language journal devoted to publishing original clinical investigation and research in the general field of rheumatology with accent on clinical aspects at postgraduate level. The journal succeeds Acta Rheumatologica Belgica, originally founded in 1945 as the official journal of the Belgian Rheumatology Society. Clinical Rheumatology aims to cover all modern trends in clinical and experimental research as well as the management and evaluation of diagnostic and treatment procedures connected with the inflammatory, immunologic, metabolic, genetic and degenerative soft and hard connective tissue diseases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信