Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study.

IF 6 1区 医学 Q1 ORTHOPEDICS
Silvia Gianola, Silvia Bargeri, Greta Castellini, Chad Cook, Alvisa Palese, Paolo Pillastrini, Silvia Salvalaggio, Andrea Turolla, Giacomo Rossettini
{"title":"Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study.","authors":"Silvia Gianola, Silvia Bargeri, Greta Castellini, Chad Cook, Alvisa Palese, Paolo Pillastrini, Silvia Salvalaggio, Andrea Turolla, Giacomo Rossettini","doi":"10.2519/jospt.2024.12151","DOIUrl":null,"url":null,"abstract":"<p><p><b>OBJECTIVE:</b> To compare the accuracy of an artificial intelligence chatbot to clinical practice guidelines (CPGs) recommendations for providing answers to complex clinical questions on lumbosacral radicular pain. <b>DESIGN:</b> Cross-sectional study. <b>METHODS:</b> We extracted recommendations from recent CPGs for diagnosing and treating lumbosacral radicular pain. Relative clinical questions were developed and queried to OpenAI's ChatGPT (GPT-3.5). We compared ChatGPT answers to CPGs recommendations by assessing the (1) internal consistency of ChatGPT answers by measuring the percentage of text wording similarity when a clinical question was posed 3 times, (2) reliability between 2 independent reviewers in grading ChatGPT answers, and (3) accuracy of ChatGPT answers compared to CPGs recommendations. Reliability was estimated using Fleiss' kappa (κ) coefficients, and accuracy by interobserver agreement as the frequency of the agreements among all judgments. <b>RESULTS:</b> We tested 9 clinical questions. The internal consistency of text ChatGPT answers was unacceptable across all 3 trials in all clinical questions (mean percentage of 49%, standard deviation of 15). Intrareliability (reviewer 1: κ = 0.90, standard error [SE] = 0.09; reviewer 2: κ = 0.90, SE = 0.10) and interreliability (κ = 0.85, SE = 0.15) between the 2 reviewers was \"almost perfect.\" Accuracy between ChatGPT answers and CPGs recommendations was slight, demonstrating agreement in 33% of recommendations. <b>CONCLUSION:</b> ChatGPT performed poorly in internal consistency and accuracy of the indications generated compared to clinical practice guideline recommendations for lumbosacral radicular pain. <i>J Orthop Sports Phys Ther 2024;54(3):1-7. Epub 29 January 2024. doi:10.2519/jospt.2024.12151</i>.</p>","PeriodicalId":50099,"journal":{"name":"Journal of Orthopaedic & Sports Physical Therapy","volume":" ","pages":"222-228"},"PeriodicalIF":6.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Orthopaedic & Sports Physical Therapy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2519/jospt.2024.12151","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

OBJECTIVE: To compare the accuracy of an artificial intelligence chatbot to clinical practice guidelines (CPGs) recommendations for providing answers to complex clinical questions on lumbosacral radicular pain. DESIGN: Cross-sectional study. METHODS: We extracted recommendations from recent CPGs for diagnosing and treating lumbosacral radicular pain. Relative clinical questions were developed and queried to OpenAI's ChatGPT (GPT-3.5). We compared ChatGPT answers to CPGs recommendations by assessing the (1) internal consistency of ChatGPT answers by measuring the percentage of text wording similarity when a clinical question was posed 3 times, (2) reliability between 2 independent reviewers in grading ChatGPT answers, and (3) accuracy of ChatGPT answers compared to CPGs recommendations. Reliability was estimated using Fleiss' kappa (κ) coefficients, and accuracy by interobserver agreement as the frequency of the agreements among all judgments. RESULTS: We tested 9 clinical questions. The internal consistency of text ChatGPT answers was unacceptable across all 3 trials in all clinical questions (mean percentage of 49%, standard deviation of 15). Intrareliability (reviewer 1: κ = 0.90, standard error [SE] = 0.09; reviewer 2: κ = 0.90, SE = 0.10) and interreliability (κ = 0.85, SE = 0.15) between the 2 reviewers was "almost perfect." Accuracy between ChatGPT answers and CPGs recommendations was slight, demonstrating agreement in 33% of recommendations. CONCLUSION: ChatGPT performed poorly in internal consistency and accuracy of the indications generated compared to clinical practice guideline recommendations for lumbosacral radicular pain. J Orthop Sports Phys Ther 2024;54(3):1-7. Epub 29 January 2024. doi:10.2519/jospt.2024.12151.

在对腰骶椎痛做出知情决定时,ChatGPT 与临床实践指南的性能比较:一项横断面研究。
目的:比较人工智能聊天机器人与临床实践指南(CPG)建议在回答腰骶根性疼痛复杂临床问题时的准确性。设计:横断面研究。方法:我们从近期的临床实践指南中提取了诊断和治疗腰骶部疼痛的建议。开发了相关临床问题,并在 Open AI 的 ChatGPT (GPT-3.5) 中进行了查询。我们将 ChatGPT 答案与 CPGs 建议进行了比较,评估方法包括:(i) 当一个临床问题被提出三次时,通过测量文本措辞相似度的百分比来评估 ChatGPT 答案的内部一致性;(ii) 两位独立审查员对 ChatGPT 答案评分的可靠性;(iii) ChatGPT 答案与 CPGs 建议相比的准确性。可靠性采用弗莱斯卡帕(κ)系数估算,准确性采用观察者之间的一致性估算,即所有判断中一致的频率。结果:我们测试了九个临床问题。在所有临床问题中,文本 ChatGPT 答案的内部一致性在所有三项试验中都是不可接受的(平均百分比为 49%,标准差为 15)。两位审阅人之间的内部(审阅人 1:κ=0.90 标准误差 (SE) =0.09;审阅人 2:κ=0.90 SE=0.10)和相互之间的可靠性(κ=0.85 SE=0.15)"几乎完美"。ChatGPT 答案与 CPGs 建议之间的准确性略有差异,33% 的建议一致。结论:与腰骶部根性疼痛临床实践指南建议相比,ChatGPT 生成的适应症在内部一致性和准确性方面表现不佳。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.00
自引率
4.90%
发文量
101
审稿时长
6-12 weeks
期刊介绍: The Journal of Orthopaedic & Sports Physical Therapy® (JOSPT®) publishes scientifically rigorous, clinically relevant content for physical therapists and others in the health care community to advance musculoskeletal and sports-related practice globally. To this end, JOSPT features the latest evidence-based research and clinical cases in musculoskeletal health, injury, and rehabilitation, including physical therapy, orthopaedics, sports medicine, and biomechanics. With an impact factor of 3.090, JOSPT is among the highest ranked physical therapy journals in Clarivate Analytics''s Journal Citation Reports, Science Edition (2017). JOSPT stands eighth of 65 journals in the category of rehabilitation, twelfth of 77 journals in orthopedics, and fourteenth of 81 journals in sport sciences. JOSPT''s 5-year impact factor is 4.061.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信