Optimizing Oxford Shoulder Scores with computerized adaptive testing reduces redundancy while maintaining precision.

IF 5.1 2区医学 Q2 CELL & TISSUE ENGINEERING

Bone & Joint Research Pub Date : 2024-08-05 DOI:10.1302/2046-3758.138.BJR-2023-0412.R1

Ahmed Barakat, Jonathan Evans, Christopher Gibbons, Harvinder P Singh

{"title":"Optimizing Oxford Shoulder Scores with computerized adaptive testing reduces redundancy while maintaining precision.","authors":"Ahmed Barakat, Jonathan Evans, Christopher Gibbons, Harvinder P Singh","doi":"10.1302/2046-3758.138.BJR-2023-0412.R1","DOIUrl":null,"url":null,"abstract":"Aims: The Oxford Shoulder Score (OSS) is a 12-item measure commonly used for the assessment of shoulder surgeries. This study explores whether computerized adaptive testing (CAT) provides a shortened, individually tailored questionnaire while maintaining test accuracy.Methods: A total of 16,238 preoperative OSS were available in the National Joint Registry (NJR) for England, Wales, Northern Ireland, the Isle of Man, and the States of Guernsey dataset (April 2012 to April 2022). Prior to CAT, the foundational item response theory (IRT) assumptions of unidimensionality, monotonicity, and local independence were established. CAT compared sequential item selection with stopping criteria set at standard error (SE) < 0.32 and SE < 0.45 (equivalent to reliability coefficients of 0.90 and 0.80) to full-length patient-reported outcome measure (PROM) precision.Results: Confirmatory factor analysis (CFA) for unidimensionality exhibited satisfactory fit with root mean square standardized residual (RSMSR) of 0.06 (cut-off ≤ 0.08) but not with comparative fit index (CFI) of 0.85 or Tucker-Lewis index (TLI) of 0.82 (cut-off > 0.90). Monotonicity, measured by H value, yielded 0.482, signifying good monotonic trends. Local independence was generally met, with Yen's Q3 statistic > 0.2 for most items. The median item count for completing the CAT simulation with a SE of 0.32 was 3 (IQR 3 to 12), while for a SE of 0.45 it was 2 (IQR 2 to 6). This constituted only 25% and 16%, respectively, when compared to the 12-item full-length questionnaire.Conclusion: Calibrating IRT for the OSS has resulted in the development of an efficient and shortened CAT while maintaining accuracy and reliability. Through the reduction of redundant items and implementation of a standardized measurement scale, our study highlights a promising approach to alleviate time burden and potentially enhance compliance with these widely used outcome measures.","PeriodicalId":9074,"journal":{"name":"Bone & Joint Research","volume":"13 8","pages":"392-400"},"PeriodicalIF":5.1000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11298256/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bone & Joint Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1302/2046-3758.138.BJR-2023-0412.R1","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CELL & TISSUE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Aims: The Oxford Shoulder Score (OSS) is a 12-item measure commonly used for the assessment of shoulder surgeries. This study explores whether computerized adaptive testing (CAT) provides a shortened, individually tailored questionnaire while maintaining test accuracy.

Methods: A total of 16,238 preoperative OSS were available in the National Joint Registry (NJR) for England, Wales, Northern Ireland, the Isle of Man, and the States of Guernsey dataset (April 2012 to April 2022). Prior to CAT, the foundational item response theory (IRT) assumptions of unidimensionality, monotonicity, and local independence were established. CAT compared sequential item selection with stopping criteria set at standard error (SE) < 0.32 and SE < 0.45 (equivalent to reliability coefficients of 0.90 and 0.80) to full-length patient-reported outcome measure (PROM) precision.

Results: Confirmatory factor analysis (CFA) for unidimensionality exhibited satisfactory fit with root mean square standardized residual (RSMSR) of 0.06 (cut-off ≤ 0.08) but not with comparative fit index (CFI) of 0.85 or Tucker-Lewis index (TLI) of 0.82 (cut-off > 0.90). Monotonicity, measured by H value, yielded 0.482, signifying good monotonic trends. Local independence was generally met, with Yen's Q3 statistic > 0.2 for most items. The median item count for completing the CAT simulation with a SE of 0.32 was 3 (IQR 3 to 12), while for a SE of 0.45 it was 2 (IQR 2 to 6). This constituted only 25% and 16%, respectively, when compared to the 12-item full-length questionnaire.

Conclusion: Calibrating IRT for the OSS has resulted in the development of an efficient and shortened CAT while maintaining accuracy and reliability. Through the reduction of redundant items and implementation of a standardized measurement scale, our study highlights a promising approach to alleviate time burden and potentially enhance compliance with these widely used outcome measures.

Abstract Image

查看原文本刊更多论文

利用计算机适应性测试优化牛津肩部评分，在保持精确度的同时减少冗余。

目的：牛津肩关节评分（OSS）是一种常用于评估肩关节手术的 12 项测量方法。本研究探讨了计算机化自适应测试（CAT）是否能在保持测试准确性的同时，提供缩短的、针对个人的问卷：在英格兰、威尔士、北爱尔兰、马恩岛和根西岛国家关节登记处（NJR）的数据集中（2012 年 4 月至 2022 年 4 月），共有 16,238 份术前 OSS。CAT之前，单维性、单调性和局部独立性等基本项目反应理论（IRT）假设已经确立。CAT将标准误差（SE）小于0.32和SE小于0.45（相当于信度系数0.90和0.80）作为停止标准的顺序项目选择与完整的患者报告结果测量（PROM）精确性进行了比较：结果：单维度确认性因子分析（CFA）显示出令人满意的拟合度，均方根标准化残差（RSMSR）为0.06（临界值≤0.08），但比较拟合指数（CFI）为0.85，塔克-刘易斯指数（TLI）为0.82（临界值>0.90）。用 H 值衡量的单调性为 0.482，表明单调趋势良好。大部分项目的局部独立性基本符合要求，Yen's Q3 统计量大于 0.2。完成 SE 为 0.32 的 CAT 模拟的项目数中位数为 3（IQR 为 3 至 12），而 SE 为 0.45 的项目数中位数为 2（IQR 为 2 至 6）。与 12 个项目的完整问卷相比，这分别只占 25% 和 16%：通过校准 OSS 的 IRT，开发出了高效、简短的 CAT，同时保持了准确性和可靠性。通过减少冗余项目和实施标准化测量量表，我们的研究强调了一种很有前景的方法，可减轻时间负担，并有可能提高这些广泛使用的结果测量的依从性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊