Improving prediction model systematic review methodology: Letter to the Editor

IF 1.2 Q3 SPORT SCIENCES
G. Bullock, T. Hughes, J. Sergeant, M. Callaghan, G. Collins, R. Riley
{"title":"Improving prediction model systematic review methodology: Letter to the Editor","authors":"G. Bullock, T. Hughes, J. Sergeant, M. Callaghan, G. Collins, R. Riley","doi":"10.1002/tsm2.240","DOIUrl":null,"url":null,"abstract":"Dear Editor, In their recently published paper, Seow et al1 carried out a systematic review of musculoskeletal injury prediction models in professional sport and military special forces. Their review encompassed a comprehensive search that included both conference and published papers, used a standardized musculoskeletal injury definition that was informed by the literature, and included both statistical and machine learningbased models. Nevertheless, we have a number of concerns regarding the conduct and reporting of some aspects of the study that limit the usefulness of their findings. Our first point relates to how the studies were appraised. While the authors should be commended on assessing each study for risk of bias, the Newcastle Ottawa Scale (NOS) is not the correct tool to do this. The NOS is a generic tool designed to assess the quality of nonrandomized studies such as casecontrol and cohort studies— and while prediction model studies often use cohort design, the tool includes no specific assessment of analysis issues relating to the development or validation of a prediction model. Hence, the NOS is a blunt instrument to assess risk of bias in these studies. The tool that should have been used to assess the risk of bias in the review by Seow et al1 is the Prediction model Risk Of Bias Assessment Tool (PROBAST),2 which includes 20 signaling questions over four domains (participants, predictors, outcome, and analysis), to cover key aspects of prediction model studies. Furthermore, when designing a systematic review of prediction model studies, the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist3 provides detailed guidance to help authors in developing their systematic review questions relating to prediction models, extracting pertinent prediction model data, and appraising prediction model studies.3 Had these more relevant tools been used, and indeed, the review process outlined by the Cochrane Prognosis Methods Group followed4; it would have enabled the authors to better appraise and utilize the included prediction model studies in their review. In particular, it would have given more depth and clarity, and allowed enhanced identification of any strength in the existing evidence and also highlighted particular areas of conduct and reporting that should be improved upon in future studies. While the authors extracted and reported the discrimination performance (such as area under the curve) of models that were included, we note that there was no comment on model calibration— an essential component of model performance.4,5 Calibration is the agreement between probabilities derived from the model versus those actually observed within the data6 and is important in understanding the accuracy of the predictions from the model.7,8 This omission could have been addressed at the design stage using the aforementioned CHARMS checklist. Consequently, the authors have missed an important opportunity to report on this critical aspect of prediction model performance assessment and therefore presented readers with incomplete information on the usefulness of the included prediction models. Furthermore, any omission of calibration in the primary studies will have a direct and negative impact on the risk of bias assessment. A related concern is that the authors do not explain how they extracted performance estimates, and whether they used the extensive tools of Debray et al9 to help extract estimates (eg, the area under the curve and its confidence interval) when these were not reported directly, in order to maximize the information available for review. Whether performance statistics were adjusted for optimism was also not reported,10 and clinical utility measures (eg, net benefit11) were not discussed. We were also concerned with the authors’ expectations regarding the handling class imbalance using overor undersampling to create a more balanced data set. Data are said to be imbalanced when there are fewer individuals in the data set with the outcome (compared to those without the outcome). In the context of classification, this can indeed be a problem, for example, when evaluating classification accuracy (ie, proportion of correct classifications) in the sense that incorrectly misclassifying individuals with the outcome in a highly imbalanced data set could yield high accuracy— as the larger nonoutcome group will dominate the calculation of overall accuracy.12 However, in the context of prediction (the aim of the review by Seow et al1), class imbalance is a feature of the","PeriodicalId":75247,"journal":{"name":"Translational sports medicine","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2021-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/tsm2.240","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational sports medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/tsm2.240","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SPORT SCIENCES","Score":null,"Total":0}
引用次数: 2

Abstract

Dear Editor, In their recently published paper, Seow et al1 carried out a systematic review of musculoskeletal injury prediction models in professional sport and military special forces. Their review encompassed a comprehensive search that included both conference and published papers, used a standardized musculoskeletal injury definition that was informed by the literature, and included both statistical and machine learningbased models. Nevertheless, we have a number of concerns regarding the conduct and reporting of some aspects of the study that limit the usefulness of their findings. Our first point relates to how the studies were appraised. While the authors should be commended on assessing each study for risk of bias, the Newcastle Ottawa Scale (NOS) is not the correct tool to do this. The NOS is a generic tool designed to assess the quality of nonrandomized studies such as casecontrol and cohort studies— and while prediction model studies often use cohort design, the tool includes no specific assessment of analysis issues relating to the development or validation of a prediction model. Hence, the NOS is a blunt instrument to assess risk of bias in these studies. The tool that should have been used to assess the risk of bias in the review by Seow et al1 is the Prediction model Risk Of Bias Assessment Tool (PROBAST),2 which includes 20 signaling questions over four domains (participants, predictors, outcome, and analysis), to cover key aspects of prediction model studies. Furthermore, when designing a systematic review of prediction model studies, the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist3 provides detailed guidance to help authors in developing their systematic review questions relating to prediction models, extracting pertinent prediction model data, and appraising prediction model studies.3 Had these more relevant tools been used, and indeed, the review process outlined by the Cochrane Prognosis Methods Group followed4; it would have enabled the authors to better appraise and utilize the included prediction model studies in their review. In particular, it would have given more depth and clarity, and allowed enhanced identification of any strength in the existing evidence and also highlighted particular areas of conduct and reporting that should be improved upon in future studies. While the authors extracted and reported the discrimination performance (such as area under the curve) of models that were included, we note that there was no comment on model calibration— an essential component of model performance.4,5 Calibration is the agreement between probabilities derived from the model versus those actually observed within the data6 and is important in understanding the accuracy of the predictions from the model.7,8 This omission could have been addressed at the design stage using the aforementioned CHARMS checklist. Consequently, the authors have missed an important opportunity to report on this critical aspect of prediction model performance assessment and therefore presented readers with incomplete information on the usefulness of the included prediction models. Furthermore, any omission of calibration in the primary studies will have a direct and negative impact on the risk of bias assessment. A related concern is that the authors do not explain how they extracted performance estimates, and whether they used the extensive tools of Debray et al9 to help extract estimates (eg, the area under the curve and its confidence interval) when these were not reported directly, in order to maximize the information available for review. Whether performance statistics were adjusted for optimism was also not reported,10 and clinical utility measures (eg, net benefit11) were not discussed. We were also concerned with the authors’ expectations regarding the handling class imbalance using overor undersampling to create a more balanced data set. Data are said to be imbalanced when there are fewer individuals in the data set with the outcome (compared to those without the outcome). In the context of classification, this can indeed be a problem, for example, when evaluating classification accuracy (ie, proportion of correct classifications) in the sense that incorrectly misclassifying individuals with the outcome in a highly imbalanced data set could yield high accuracy— as the larger nonoutcome group will dominate the calculation of overall accuracy.12 However, in the context of prediction (the aim of the review by Seow et al1), class imbalance is a feature of the
改进预测模型系统评价方法:致编辑的信
在他们最近发表的论文中,Seow等人对专业运动和军事特种部队的肌肉骨骼损伤预测模型进行了系统的回顾。他们的综述包括一个全面的搜索,包括会议和发表的论文,使用了一个标准化的肌肉骨骼损伤定义,这是由文献提供的信息,并包括统计和基于机器学习的模型。然而,我们对研究的某些方面的进行和报告有一些关切,这些方面限制了研究结果的有用性。我们的第一点与如何评价这些研究有关。虽然作者在评估每项研究的偏倚风险方面应该受到赞扬,但纽卡斯尔渥太华量表(NOS)并不是正确的工具。NOS是一种通用工具,用于评估非随机研究(如病例对照和队列研究)的质量,而预测模型研究通常使用队列设计,该工具不包括与预测模型开发或验证相关的分析问题的具体评估。因此,NOS是评估这些研究中偏倚风险的钝性工具。在Seow等人的综述中,本应用于评估偏倚风险的工具是预测模型偏倚风险评估工具(PROBAST) 2,该工具包括四个领域(参与者、预测者、结果和分析)的20个信号问题,涵盖了预测模型研究的关键方面。此外,在设计预测模型研究的系统评价时,预测模型研究系统评价的关键评价和数据提取(CHARMS)清单3提供了详细的指导,以帮助作者开发与预测模型相关的系统评价问题,提取相关的预测模型数据,并评价预测模型研究是否使用了这些更相关的工具,并且确实遵循了Cochrane预后方法组概述的审查过程4;这将使作者在他们的综述中更好地评价和利用纳入的预测模型研究。特别是,它可以提供更深入和更清晰的信息,并可以更好地确定现有证据中的任何优势,还可以突出在今后的研究中应加以改进的行为和报告的特定领域。虽然作者提取并报告了纳入模型的判别性能(如曲线下面积),但我们注意到没有对模型校准(模型性能的重要组成部分)进行评论。4,5校准是指从模型中得到的概率与数据中实际观察到的概率之间的一致性,对于理解模型预测的准确性非常重要。7,8本可以在设计阶段使用上述的CHARMS检查表来解决这一遗漏。因此,作者错过了报告预测模型性能评估的这一关键方面的重要机会,因此向读者提供了关于所包括的预测模型有用性的不完整信息。此外,在初步研究中任何校准的遗漏都会对偏倚评估的风险产生直接的负面影响。一个相关的问题是,作者没有解释他们是如何提取性能估计的,以及他们是否使用了Debray等人的广泛工具9来帮助提取估计(例如,曲线下的面积及其置信区间),当这些没有直接报告时,为了最大化可用于审查的信息。业绩统计是否为乐观调整也没有报道,10和临床效用测量(如净收益11)也没有讨论。我们还关注作者对使用过采样或欠采样来处理类不平衡以创建更平衡的数据集的期望。当数据集中有结果的个体较少(与没有结果的个体相比)时,数据被称为不平衡的。在分类的背景下,这确实可能是一个问题,例如,在评估分类准确性(即正确分类的比例)时,在高度不平衡的数据集中错误地错误分类具有结果的个体可能产生很高的准确性-因为较大的非结果组将主导总体准确性的计算然而,在预测的背景下(Seow等人回顾的目的1),阶级不平衡是一个特征
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.00
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信