ChatGPT is a comprehensive education tool for patients with patellar tendinopathy, but it currently lacks accuracy and readability

IF 2.2 3区医学 Q1 REHABILITATION

Musculoskeletal Science and Practice Pub Date : 2025-01-31 DOI:10.1016/j.msksp.2025.103275

Jie Deng , Lun Li , Jelle J. Oosterhof , Peter Malliaras , Karin Grävare Silbernagel , Stephan J. Breda , Denise Eygendaal , Edwin HG. Oei , Robert-Jan de Vos

{"title":"ChatGPT is a comprehensive education tool for patients with patellar tendinopathy, but it currently lacks accuracy and readability","authors":"Jie Deng , Lun Li , Jelle J. Oosterhof , Peter Malliaras , Karin Grävare Silbernagel , Stephan J. Breda , Denise Eygendaal , Edwin HG. Oei , Robert-Jan de Vos","doi":"10.1016/j.msksp.2025.103275","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Generative artificial intelligence tools, such as ChatGPT, are becoming increasingly integrated into daily life, and patients might turn to this tool to seek medical information.</div></div><div><h3>Objective</h3><div>To evaluate the performance of ChatGPT-4 in responding to patient-centered queries for patellar tendinopathy (PT).</div></div><div><h3>Methods</h3><div>Forty-eight patient-centered queries were collected from online sources, PT patients, and experts and were then submitted to ChatGPT-4. Three board-certified experts independently assessed the accuracy and comprehensiveness of the responses. Readability was measured using the Flesch-Kincaid Grade Level (FKGL: higher scores indicate a higher grade reading level). The Patient Education Materials Assessment Tool (PEMAT) evaluated understandability, and actionability (0–100%, higher scores indicate information with clearer messages and more identifiable actions). Semantic Textual Similarity (STS score, 0–1; higher scores indicate higher similarity) assessed variation in the meaning of texts over two months (including ChatGPT-4o) and for different terminologies related to PT.</div></div><div><h3>Results</h3><div>Sixteen (33%) of the 48 responses were rated accurate, while 36 (75%) were rated comprehensive. Only 17% of treatment-related questions received accurate responses. Most responses were written at a college reading level (median and interquartile range [IQR] of FKGL score: 15.4 [14.4–16.6]). The median of PEMAT for understandability was 83% (IQR: 70%–92%), and for actionability, it was 60% (IQR: 40%–60%). The medians of STS scores in the meaning of texts over two months and across terminologies were all ≥ 0.9.</div></div><div><h3>Conclusions</h3><div>ChatGPT-4 provided generally comprehensive information in response to patient-centered queries but lacked accuracy and was difficult to read for individuals below a college reading level.</div></div>","PeriodicalId":56036,"journal":{"name":"Musculoskeletal Science and Practice","volume":"76 ","pages":"Article 103275"},"PeriodicalIF":2.2000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Musculoskeletal Science and Practice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468781225000232","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REHABILITATION","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Generative artificial intelligence tools, such as ChatGPT, are becoming increasingly integrated into daily life, and patients might turn to this tool to seek medical information.

Objective

To evaluate the performance of ChatGPT-4 in responding to patient-centered queries for patellar tendinopathy (PT).

Methods

Forty-eight patient-centered queries were collected from online sources, PT patients, and experts and were then submitted to ChatGPT-4. Three board-certified experts independently assessed the accuracy and comprehensiveness of the responses. Readability was measured using the Flesch-Kincaid Grade Level (FKGL: higher scores indicate a higher grade reading level). The Patient Education Materials Assessment Tool (PEMAT) evaluated understandability, and actionability (0–100%, higher scores indicate information with clearer messages and more identifiable actions). Semantic Textual Similarity (STS score, 0–1; higher scores indicate higher similarity) assessed variation in the meaning of texts over two months (including ChatGPT-4o) and for different terminologies related to PT.

Results

Sixteen (33%) of the 48 responses were rated accurate, while 36 (75%) were rated comprehensive. Only 17% of treatment-related questions received accurate responses. Most responses were written at a college reading level (median and interquartile range [IQR] of FKGL score: 15.4 [14.4–16.6]). The median of PEMAT for understandability was 83% (IQR: 70%–92%), and for actionability, it was 60% (IQR: 40%–60%). The medians of STS scores in the meaning of texts over two months and across terminologies were all ≥ 0.9.

Conclusions

ChatGPT-4 provided generally comprehensive information in response to patient-centered queries but lacked accuracy and was difficult to read for individuals below a college reading level.

Abstract Image

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Musculoskeletal Science and Practice Health Professions-Physical Therapy, Sports Therapy and Rehabilitation

CiteScore

4.10

自引率

8.70%

发文量

152

审稿时长

48 days

期刊介绍： Musculoskeletal Science & Practice, international journal of musculoskeletal physiotherapy, is a peer-reviewed international journal (previously Manual Therapy), publishing high quality original research, review and Masterclass articles that contribute to improving the clinical understanding of appropriate care processes for musculoskeletal disorders. The journal publishes articles that influence or add to the body of evidence on diagnostic and therapeutic processes, patient centered care, guidelines for musculoskeletal therapeutics and theoretical models that support developments in assessment, diagnosis, clinical reasoning and interventions.