Is It Possible to Develop a Patient-reported Experience Measure With Lower Ceiling Effect?

IF 4.2 2区医学 Q1 ORTHOPEDICS

Clinical Orthopaedics and Related Research® Pub Date : 2025-04-01 Epub Date: 2024-10-25 DOI:10.1097/CORR.0000000000003262

Niels Brinkman, Rick Looman, Prakash Jayakumar, David Ring, Seung Choi

{"title":"Is It Possible to Develop a Patient-reported Experience Measure With Lower Ceiling Effect?","authors":"Niels Brinkman, Rick Looman, Prakash Jayakumar, David Ring, Seung Choi","doi":"10.1097/CORR.0000000000003262","DOIUrl":null,"url":null,"abstract":"Background: Patient-reported experience measures (PREMs), such as the Jefferson Scale of Patient Perceptions of Physician Empathy (JSPPPE) or the Wake Forest Trust in Physician Scale (WTPS), have notable intercorrelation and ceiling effects (the proportion of observations with the highest possible score). Information is lost when high ceiling effects occur as there almost certainly is at least some variation among the patients with the highest score that the measurement tool was unable to measure. Efforts to identify and quantify factors associated with diminished patient experience can benefit from a PREM with more variability and a smaller proportion of highest possible scores (that is, a more limited ceiling effect) than occurs with currently available PREMs.Questions/purposes: In the first stage of a two-stage process, using a cohort of patients seeking musculoskeletal specialty care, we asked: (1) What groupings of items that address a similar aspect of patient experience are present among binary items directed at patient experience and derived from commonly used PREMs? (2) Can a small number of representative items provide a measure with potential for less of a ceiling effect (high item difficulty parameters)? In a second, independent cohort enrolled to assess whether the identified items perform consistently among different cohorts, we asked: (3) Does the new PREM perform differently in terms of item groupings (factor structure), and would different subsets of the included items provide the same measurement results (internal consistency) when items are measured using a 5-point rating scale instead of a binary scale? (4) What are the differences in survey properties (for example, ceiling effects) and correlation between the new PREM and commonly used PREMs?Methods: In two cross-sectional studies among patients seeking musculoskeletal specialty care conducted in 2022 and 2023, all English-speaking and English-reading adults (ages 18 to 89 years) without cognitive deficiency were invited to participate in two consecutive, separate cohorts to help develop (the initial, learning cohort) and internally validate (the second, validation cohort) a provisional new PREM. We identified 218 eligible patients for the initial learning cohort, of whom all completed all measures. Participants had a mean ± SD age of 55 ± 16 years, 60% (130) were women, 45% (99) had private insurance, and most sought care for lower extremity (56% [121]) and nontraumatic conditions (63% [137]). We measured 25 items derived from other commonly used PREMs that address aspects of patient experience in which patients reported whether they agreed or disagreed (binary) with certain statements about their clinician. We performed an exploratory factor analysis and confirmatory factor analysis (CFA) to identify groups of items that measure the same underlying construct related to patient experience. We then applied a two-parameter logistic model based on item response theory to identify the most discriminating items with the most variability (item difficulty) with the aim of reducing the ceiling effect. We also conducted a differential item functioning analysis to assess whether specific items are rated discordantly by specific subgroups of patients, which can introduce bias. We then enrolled 154 eligible patients, of whom 99% (153) completed all required measures, into a validation cohort with similar demographic characteristics. We changed the binary items to 5-point Likert scales to increase the potential for variation in an attempt to further reduce ceiling effects and repeated the CFA. We also measured internal consistency (using Cronbach alpha) and the correlation of the new PREM with other commonly used PREMs using bivariate analyses.Results: We identified three groupings of items in the learning cohort representing \"trust in clinician\" (13 items), \"relationship with clinician\" (7 items), and \"participation in shared decision-making\" (4 items). The \"trust in clinician\" factor performed best of all three factors and therefore was selected for subsequent analyses. We selected the best-performing items in terms of item difficulty to generate a 7-item short form. We found excellent CFA model fit (the 13-item and 7-item versions both had a root mean square error of approximation [RMSEA] of < 0.001), excellent internal consistency (Cronbach α was 0.94 for the 13-item version and 0.91 for the 7-item version), good item response theory parameters (item difficulty ranging between -0.37 and 0.16 for the 7-item version, with higher values indicating lower ceiling effect), no local dependencies, and no differential item functioning among any of the items. The other two factors were excluded from measure development due to low item response theory parameters (item difficulty ranging between -1.3 and -0.69, indicating higher ceiling effect), multiple local dependencies, and exhausting the number of items without being able to address these issues. The validation cohort confirmed adequate item selection and performance of both the 13-item and 7-item version of the Trust and Experience with Clinicians Scale (TRECS), with good to excellent CFA model fit (RMSEA 0.058 [TRECS-13]; RMSEA 0.016 [TRECS-7]), excellent internal consistency (Cronbach α = 0.96 [TRECS-13]; Cronbach α = 0.92 [TRECS-7]), no differential item functioning and limited ceiling effects (11% [TRECS-13]; 14% [TRECS-7]), and notable correlation with other PREMs such as the JSPPPE (ρ = 0.77) and WTPS (ρ = 0.74).Conclusion: A relatively brief 7-item measure of patient experience focused on trust can eliminate most of the ceiling effects common to PREMs with good psychometric properties. Future studies may externally validate the TRECS in other populations as well as provide population-based T-score conversion tables based on a larger sample size more representative of the population seeking musculoskeletal care.Clinical relevance: A PREM anchored in trust that reduces loss of information at the higher end of the scale can help individuals and institutions to assess experience more accurately, gauge the impact of interventions, and generate effective ways to learn and improve within a health system.","PeriodicalId":10404,"journal":{"name":"Clinical Orthopaedics and Related Research®","volume":" ","pages":"693-703"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11936666/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Orthopaedics and Related Research®","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/CORR.0000000000003262","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/25 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Patient-reported experience measures (PREMs), such as the Jefferson Scale of Patient Perceptions of Physician Empathy (JSPPPE) or the Wake Forest Trust in Physician Scale (WTPS), have notable intercorrelation and ceiling effects (the proportion of observations with the highest possible score). Information is lost when high ceiling effects occur as there almost certainly is at least some variation among the patients with the highest score that the measurement tool was unable to measure. Efforts to identify and quantify factors associated with diminished patient experience can benefit from a PREM with more variability and a smaller proportion of highest possible scores (that is, a more limited ceiling effect) than occurs with currently available PREMs.

Questions/purposes: In the first stage of a two-stage process, using a cohort of patients seeking musculoskeletal specialty care, we asked: (1) What groupings of items that address a similar aspect of patient experience are present among binary items directed at patient experience and derived from commonly used PREMs? (2) Can a small number of representative items provide a measure with potential for less of a ceiling effect (high item difficulty parameters)? In a second, independent cohort enrolled to assess whether the identified items perform consistently among different cohorts, we asked: (3) Does the new PREM perform differently in terms of item groupings (factor structure), and would different subsets of the included items provide the same measurement results (internal consistency) when items are measured using a 5-point rating scale instead of a binary scale? (4) What are the differences in survey properties (for example, ceiling effects) and correlation between the new PREM and commonly used PREMs?

Methods: In two cross-sectional studies among patients seeking musculoskeletal specialty care conducted in 2022 and 2023, all English-speaking and English-reading adults (ages 18 to 89 years) without cognitive deficiency were invited to participate in two consecutive, separate cohorts to help develop (the initial, learning cohort) and internally validate (the second, validation cohort) a provisional new PREM. We identified 218 eligible patients for the initial learning cohort, of whom all completed all measures. Participants had a mean ± SD age of 55 ± 16 years, 60% (130) were women, 45% (99) had private insurance, and most sought care for lower extremity (56% [121]) and nontraumatic conditions (63% [137]). We measured 25 items derived from other commonly used PREMs that address aspects of patient experience in which patients reported whether they agreed or disagreed (binary) with certain statements about their clinician. We performed an exploratory factor analysis and confirmatory factor analysis (CFA) to identify groups of items that measure the same underlying construct related to patient experience. We then applied a two-parameter logistic model based on item response theory to identify the most discriminating items with the most variability (item difficulty) with the aim of reducing the ceiling effect. We also conducted a differential item functioning analysis to assess whether specific items are rated discordantly by specific subgroups of patients, which can introduce bias. We then enrolled 154 eligible patients, of whom 99% (153) completed all required measures, into a validation cohort with similar demographic characteristics. We changed the binary items to 5-point Likert scales to increase the potential for variation in an attempt to further reduce ceiling effects and repeated the CFA. We also measured internal consistency (using Cronbach alpha) and the correlation of the new PREM with other commonly used PREMs using bivariate analyses.

Results: We identified three groupings of items in the learning cohort representing "trust in clinician" (13 items), "relationship with clinician" (7 items), and "participation in shared decision-making" (4 items). The "trust in clinician" factor performed best of all three factors and therefore was selected for subsequent analyses. We selected the best-performing items in terms of item difficulty to generate a 7-item short form. We found excellent CFA model fit (the 13-item and 7-item versions both had a root mean square error of approximation [RMSEA] of < 0.001), excellent internal consistency (Cronbach α was 0.94 for the 13-item version and 0.91 for the 7-item version), good item response theory parameters (item difficulty ranging between -0.37 and 0.16 for the 7-item version, with higher values indicating lower ceiling effect), no local dependencies, and no differential item functioning among any of the items. The other two factors were excluded from measure development due to low item response theory parameters (item difficulty ranging between -1.3 and -0.69, indicating higher ceiling effect), multiple local dependencies, and exhausting the number of items without being able to address these issues. The validation cohort confirmed adequate item selection and performance of both the 13-item and 7-item version of the Trust and Experience with Clinicians Scale (TRECS), with good to excellent CFA model fit (RMSEA 0.058 [TRECS-13]; RMSEA 0.016 [TRECS-7]), excellent internal consistency (Cronbach α = 0.96 [TRECS-13]; Cronbach α = 0.92 [TRECS-7]), no differential item functioning and limited ceiling effects (11% [TRECS-13]; 14% [TRECS-7]), and notable correlation with other PREMs such as the JSPPPE (ρ = 0.77) and WTPS (ρ = 0.74).

Conclusion: A relatively brief 7-item measure of patient experience focused on trust can eliminate most of the ceiling effects common to PREMs with good psychometric properties. Future studies may externally validate the TRECS in other populations as well as provide population-based T-score conversion tables based on a larger sample size more representative of the population seeking musculoskeletal care.

Clinical relevance: A PREM anchored in trust that reduces loss of information at the higher end of the scale can help individuals and institutions to assess experience more accurately, gauge the impact of interventions, and generate effective ways to learn and improve within a health system.

查看原文本刊更多论文

是否有可能开发出上限效应较低的患者报告体验测量方法？

背景：患者报告的体验测量（PREMs），如杰斐逊患者对医生移情感知量表（JSPPPE）或维克森林医生信任量表（WTPS），具有显著的相互关联性和上限效应（获得最高分的观察比例）。当出现高上限效应时，信息就会丢失，因为几乎可以肯定在获得最高分的患者中至少存在一些测量工具无法测量的差异。与目前可用的 PREM 相比，如果 PREM 的变异性更大，可能获得最高分的比例更小（即天花板效应更有限），那么识别和量化与患者体验下降有关的因素的工作就能从中受益：在两阶段过程的第一阶段，我们利用一组寻求肌肉骨骼专科治疗的患者，提出了以下问题：（1）在针对患者体验的二元项目中，有哪些分组项目涉及患者体验的相似方面，这些分组项目来自常用的 PREMs？(2) 少数具有代表性的项目能否提供一种可能较少天花板效应（项目难度参数较高）的测量方法？在第二项独立的队列研究中，为了评估已确定的项目在不同队列中的表现是否一致，我们提出了以下问题：(3) 新的 PREM 在项目分组（因子结构）方面的表现是否不同，当使用 5 点评分量表而不是二元量表测量项目时，所包含项目的不同子集是否会提供相同的测量结果（内部一致性）？(4) 新的 PREM 与常用的 PREM 在调查特性（如上限效应）和相关性方面有何不同？在 2022 年和 2023 年对寻求肌肉骨骼专科治疗的患者进行的两项横断面研究中，我们邀请了所有讲英语和阅读英语且无认知缺陷的成年人（年龄在 18 至 89 岁之间）参加两个连续的独立队列，以帮助开发（初始队列，即学习队列）和内部验证（第二个队列，即验证队列）临时新 PREM。我们为初始学习队列确定了 218 名符合条件的患者，他们全部完成了所有测量。参与者的平均（± SD）年龄为 55 ± 16 岁，60%（130 人）为女性，45%（99 人）有私人保险，大多数人因下肢疾病（56% [121人]）和非创伤性疾病（63% [137人]）就医。我们测量了来自其他常用 PREMs 的 25 个项目，这些项目涉及患者体验的各个方面，其中患者报告了他们是否同意或不同意（二元）有关其临床医生的某些陈述。我们进行了探索性因子分析和确证性因子分析 (CFA)，以确定测量与患者体验相关的相同基本结构的项目组。然后，我们根据项目反应理论应用了一个双参数逻辑模型，以确定变异性（项目难度）最大的最具区分度的项目，从而减少天花板效应。我们还进行了差异项目功能分析，以评估特定项目是否会被特定亚组患者不一致地评定，从而导致偏差。然后，我们将 154 名符合条件的患者纳入具有相似人口统计学特征的验证队列，其中 99% 的患者（153 人）完成了所有必要的测量。为了进一步减少上限效应，我们将二元项目改为 5 点李克特量表，以增加变异的可能性，并重复了 CFA。我们还测量了内部一致性（使用 Cronbach alpha），并使用双变量分析测量了新 PREM 与其他常用 PREM 的相关性：我们在学习队列中确定了三个项目分组，分别代表 "对临床医生的信任"（13 个项目）、"与临床医生的关系"（7 个项目）和 "参与共同决策"（4 个项目）。在所有三个因子中，"对临床医生的信任 "因子表现最好，因此被选中进行后续分析。我们选择了在项目难度方面表现最好的项目，生成了一个 7 个项目的简表。我们发现 CFA 模型拟合效果极佳（13 个项目和 7 个项目版本的均方根近似误差 [RMSEA] 均小于 0.001），内部一致性极好（13 个项目版本的 Cronbach α 为 0.94，7 个项目版本的 Cronbach α 为 0.91），项目反应理论参数良好（7 个项目版本的项目难度介于 -0.37 和 0.16 之间，数值越高表示天花板效应越低），没有局部依赖性，任何项目之间的项目功能均无差异。另外两个因子由于项目反应理论参数较低（项目难度介于-1.3 和-0.69 之间，表明上限效应较高）、存在多种局部依赖性以及项目数量过多而无法解决这些问题，因此被排除在测量开发之外。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical Orthopaedics and Related Research® 医学-外科

CiteScore

7.00

自引率

11.90%

发文量

722

审稿时长

2.5 months

期刊介绍： Clinical Orthopaedics and Related Research® is a leading peer-reviewed journal devoted to the dissemination of new and important orthopaedic knowledge. CORR® brings readers the latest clinical and basic research, along with columns, commentaries, and interviews with authors.