ChatGPT is a comprehensive education tool for patients with patellar tendinopathy, but it currently lacks accuracy and readability

IF 2.2 3区 医学 Q1 REHABILITATION
Jie Deng , Lun Li , Jelle J. Oosterhof , Peter Malliaras , Karin Grävare Silbernagel , Stephan J. Breda , Denise Eygendaal , Edwin HG. Oei , Robert-Jan de Vos
{"title":"ChatGPT is a comprehensive education tool for patients with patellar tendinopathy, but it currently lacks accuracy and readability","authors":"Jie Deng ,&nbsp;Lun Li ,&nbsp;Jelle J. Oosterhof ,&nbsp;Peter Malliaras ,&nbsp;Karin Grävare Silbernagel ,&nbsp;Stephan J. Breda ,&nbsp;Denise Eygendaal ,&nbsp;Edwin HG. Oei ,&nbsp;Robert-Jan de Vos","doi":"10.1016/j.msksp.2025.103275","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Generative artificial intelligence tools, such as ChatGPT, are becoming increasingly integrated into daily life, and patients might turn to this tool to seek medical information.</div></div><div><h3>Objective</h3><div>To evaluate the performance of ChatGPT-4 in responding to patient-centered queries for patellar tendinopathy (PT).</div></div><div><h3>Methods</h3><div>Forty-eight patient-centered queries were collected from online sources, PT patients, and experts and were then submitted to ChatGPT-4. Three board-certified experts independently assessed the accuracy and comprehensiveness of the responses. Readability was measured using the Flesch-Kincaid Grade Level (FKGL: higher scores indicate a higher grade reading level). The Patient Education Materials Assessment Tool (PEMAT) evaluated understandability, and actionability (0–100%, higher scores indicate information with clearer messages and more identifiable actions). Semantic Textual Similarity (STS score, 0–1; higher scores indicate higher similarity) assessed variation in the meaning of texts over two months (including ChatGPT-4o) and for different terminologies related to PT.</div></div><div><h3>Results</h3><div>Sixteen (33%) of the 48 responses were rated accurate, while 36 (75%) were rated comprehensive. Only 17% of treatment-related questions received accurate responses. Most responses were written at a college reading level (median and interquartile range [IQR] of FKGL score: 15.4 [14.4–16.6]). The median of PEMAT for understandability was 83% (IQR: 70%–92%), and for actionability, it was 60% (IQR: 40%–60%). The medians of STS scores in the meaning of texts over two months and across terminologies were all ≥ 0.9.</div></div><div><h3>Conclusions</h3><div>ChatGPT-4 provided generally comprehensive information in response to patient-centered queries but lacked accuracy and was difficult to read for individuals below a college reading level.</div></div>","PeriodicalId":56036,"journal":{"name":"Musculoskeletal Science and Practice","volume":"76 ","pages":"Article 103275"},"PeriodicalIF":2.2000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Musculoskeletal Science and Practice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468781225000232","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REHABILITATION","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Generative artificial intelligence tools, such as ChatGPT, are becoming increasingly integrated into daily life, and patients might turn to this tool to seek medical information.

Objective

To evaluate the performance of ChatGPT-4 in responding to patient-centered queries for patellar tendinopathy (PT).

Methods

Forty-eight patient-centered queries were collected from online sources, PT patients, and experts and were then submitted to ChatGPT-4. Three board-certified experts independently assessed the accuracy and comprehensiveness of the responses. Readability was measured using the Flesch-Kincaid Grade Level (FKGL: higher scores indicate a higher grade reading level). The Patient Education Materials Assessment Tool (PEMAT) evaluated understandability, and actionability (0–100%, higher scores indicate information with clearer messages and more identifiable actions). Semantic Textual Similarity (STS score, 0–1; higher scores indicate higher similarity) assessed variation in the meaning of texts over two months (including ChatGPT-4o) and for different terminologies related to PT.

Results

Sixteen (33%) of the 48 responses were rated accurate, while 36 (75%) were rated comprehensive. Only 17% of treatment-related questions received accurate responses. Most responses were written at a college reading level (median and interquartile range [IQR] of FKGL score: 15.4 [14.4–16.6]). The median of PEMAT for understandability was 83% (IQR: 70%–92%), and for actionability, it was 60% (IQR: 40%–60%). The medians of STS scores in the meaning of texts over two months and across terminologies were all ≥ 0.9.

Conclusions

ChatGPT-4 provided generally comprehensive information in response to patient-centered queries but lacked accuracy and was difficult to read for individuals below a college reading level.

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
Musculoskeletal Science and Practice
Musculoskeletal Science and Practice Health Professions-Physical Therapy, Sports Therapy and Rehabilitation
CiteScore
4.10
自引率
8.70%
发文量
152
审稿时长
48 days
期刊介绍: Musculoskeletal Science & Practice, international journal of musculoskeletal physiotherapy, is a peer-reviewed international journal (previously Manual Therapy), publishing high quality original research, review and Masterclass articles that contribute to improving the clinical understanding of appropriate care processes for musculoskeletal disorders. The journal publishes articles that influence or add to the body of evidence on diagnostic and therapeutic processes, patient centered care, guidelines for musculoskeletal therapeutics and theoretical models that support developments in assessment, diagnosis, clinical reasoning and interventions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信