G. Bullock, T. Hughes, J. Sergeant, M. Callaghan, G. Collins, R. Riley
{"title":"改进预测模型系统评价方法:致编辑的信","authors":"G. Bullock, T. Hughes, J. Sergeant, M. Callaghan, G. Collins, R. Riley","doi":"10.1002/tsm2.240","DOIUrl":null,"url":null,"abstract":"Dear Editor, In their recently published paper, Seow et al1 carried out a systematic review of musculoskeletal injury prediction models in professional sport and military special forces. Their review encompassed a comprehensive search that included both conference and published papers, used a standardized musculoskeletal injury definition that was informed by the literature, and included both statistical and machine learningbased models. Nevertheless, we have a number of concerns regarding the conduct and reporting of some aspects of the study that limit the usefulness of their findings. Our first point relates to how the studies were appraised. While the authors should be commended on assessing each study for risk of bias, the Newcastle Ottawa Scale (NOS) is not the correct tool to do this. The NOS is a generic tool designed to assess the quality of nonrandomized studies such as casecontrol and cohort studies— and while prediction model studies often use cohort design, the tool includes no specific assessment of analysis issues relating to the development or validation of a prediction model. Hence, the NOS is a blunt instrument to assess risk of bias in these studies. The tool that should have been used to assess the risk of bias in the review by Seow et al1 is the Prediction model Risk Of Bias Assessment Tool (PROBAST),2 which includes 20 signaling questions over four domains (participants, predictors, outcome, and analysis), to cover key aspects of prediction model studies. Furthermore, when designing a systematic review of prediction model studies, the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist3 provides detailed guidance to help authors in developing their systematic review questions relating to prediction models, extracting pertinent prediction model data, and appraising prediction model studies.3 Had these more relevant tools been used, and indeed, the review process outlined by the Cochrane Prognosis Methods Group followed4; it would have enabled the authors to better appraise and utilize the included prediction model studies in their review. In particular, it would have given more depth and clarity, and allowed enhanced identification of any strength in the existing evidence and also highlighted particular areas of conduct and reporting that should be improved upon in future studies. While the authors extracted and reported the discrimination performance (such as area under the curve) of models that were included, we note that there was no comment on model calibration— an essential component of model performance.4,5 Calibration is the agreement between probabilities derived from the model versus those actually observed within the data6 and is important in understanding the accuracy of the predictions from the model.7,8 This omission could have been addressed at the design stage using the aforementioned CHARMS checklist. Consequently, the authors have missed an important opportunity to report on this critical aspect of prediction model performance assessment and therefore presented readers with incomplete information on the usefulness of the included prediction models. Furthermore, any omission of calibration in the primary studies will have a direct and negative impact on the risk of bias assessment. A related concern is that the authors do not explain how they extracted performance estimates, and whether they used the extensive tools of Debray et al9 to help extract estimates (eg, the area under the curve and its confidence interval) when these were not reported directly, in order to maximize the information available for review. Whether performance statistics were adjusted for optimism was also not reported,10 and clinical utility measures (eg, net benefit11) were not discussed. We were also concerned with the authors’ expectations regarding the handling class imbalance using overor undersampling to create a more balanced data set. Data are said to be imbalanced when there are fewer individuals in the data set with the outcome (compared to those without the outcome). In the context of classification, this can indeed be a problem, for example, when evaluating classification accuracy (ie, proportion of correct classifications) in the sense that incorrectly misclassifying individuals with the outcome in a highly imbalanced data set could yield high accuracy— as the larger nonoutcome group will dominate the calculation of overall accuracy.12 However, in the context of prediction (the aim of the review by Seow et al1), class imbalance is a feature of the","PeriodicalId":75247,"journal":{"name":"Translational sports medicine","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2021-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/tsm2.240","citationCount":"2","resultStr":"{\"title\":\"Improving prediction model systematic review methodology: Letter to the Editor\",\"authors\":\"G. Bullock, T. Hughes, J. Sergeant, M. Callaghan, G. Collins, R. Riley\",\"doi\":\"10.1002/tsm2.240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dear Editor, In their recently published paper, Seow et al1 carried out a systematic review of musculoskeletal injury prediction models in professional sport and military special forces. Their review encompassed a comprehensive search that included both conference and published papers, used a standardized musculoskeletal injury definition that was informed by the literature, and included both statistical and machine learningbased models. Nevertheless, we have a number of concerns regarding the conduct and reporting of some aspects of the study that limit the usefulness of their findings. Our first point relates to how the studies were appraised. While the authors should be commended on assessing each study for risk of bias, the Newcastle Ottawa Scale (NOS) is not the correct tool to do this. The NOS is a generic tool designed to assess the quality of nonrandomized studies such as casecontrol and cohort studies— and while prediction model studies often use cohort design, the tool includes no specific assessment of analysis issues relating to the development or validation of a prediction model. Hence, the NOS is a blunt instrument to assess risk of bias in these studies. The tool that should have been used to assess the risk of bias in the review by Seow et al1 is the Prediction model Risk Of Bias Assessment Tool (PROBAST),2 which includes 20 signaling questions over four domains (participants, predictors, outcome, and analysis), to cover key aspects of prediction model studies. Furthermore, when designing a systematic review of prediction model studies, the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist3 provides detailed guidance to help authors in developing their systematic review questions relating to prediction models, extracting pertinent prediction model data, and appraising prediction model studies.3 Had these more relevant tools been used, and indeed, the review process outlined by the Cochrane Prognosis Methods Group followed4; it would have enabled the authors to better appraise and utilize the included prediction model studies in their review. In particular, it would have given more depth and clarity, and allowed enhanced identification of any strength in the existing evidence and also highlighted particular areas of conduct and reporting that should be improved upon in future studies. While the authors extracted and reported the discrimination performance (such as area under the curve) of models that were included, we note that there was no comment on model calibration— an essential component of model performance.4,5 Calibration is the agreement between probabilities derived from the model versus those actually observed within the data6 and is important in understanding the accuracy of the predictions from the model.7,8 This omission could have been addressed at the design stage using the aforementioned CHARMS checklist. Consequently, the authors have missed an important opportunity to report on this critical aspect of prediction model performance assessment and therefore presented readers with incomplete information on the usefulness of the included prediction models. Furthermore, any omission of calibration in the primary studies will have a direct and negative impact on the risk of bias assessment. A related concern is that the authors do not explain how they extracted performance estimates, and whether they used the extensive tools of Debray et al9 to help extract estimates (eg, the area under the curve and its confidence interval) when these were not reported directly, in order to maximize the information available for review. Whether performance statistics were adjusted for optimism was also not reported,10 and clinical utility measures (eg, net benefit11) were not discussed. We were also concerned with the authors’ expectations regarding the handling class imbalance using overor undersampling to create a more balanced data set. Data are said to be imbalanced when there are fewer individuals in the data set with the outcome (compared to those without the outcome). In the context of classification, this can indeed be a problem, for example, when evaluating classification accuracy (ie, proportion of correct classifications) in the sense that incorrectly misclassifying individuals with the outcome in a highly imbalanced data set could yield high accuracy— as the larger nonoutcome group will dominate the calculation of overall accuracy.12 However, in the context of prediction (the aim of the review by Seow et al1), class imbalance is a feature of the\",\"PeriodicalId\":75247,\"journal\":{\"name\":\"Translational sports medicine\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2021-02-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1002/tsm2.240\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Translational sports medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/tsm2.240\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"SPORT SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational sports medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/tsm2.240","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SPORT SCIENCES","Score":null,"Total":0}
Improving prediction model systematic review methodology: Letter to the Editor
Dear Editor, In their recently published paper, Seow et al1 carried out a systematic review of musculoskeletal injury prediction models in professional sport and military special forces. Their review encompassed a comprehensive search that included both conference and published papers, used a standardized musculoskeletal injury definition that was informed by the literature, and included both statistical and machine learningbased models. Nevertheless, we have a number of concerns regarding the conduct and reporting of some aspects of the study that limit the usefulness of their findings. Our first point relates to how the studies were appraised. While the authors should be commended on assessing each study for risk of bias, the Newcastle Ottawa Scale (NOS) is not the correct tool to do this. The NOS is a generic tool designed to assess the quality of nonrandomized studies such as casecontrol and cohort studies— and while prediction model studies often use cohort design, the tool includes no specific assessment of analysis issues relating to the development or validation of a prediction model. Hence, the NOS is a blunt instrument to assess risk of bias in these studies. The tool that should have been used to assess the risk of bias in the review by Seow et al1 is the Prediction model Risk Of Bias Assessment Tool (PROBAST),2 which includes 20 signaling questions over four domains (participants, predictors, outcome, and analysis), to cover key aspects of prediction model studies. Furthermore, when designing a systematic review of prediction model studies, the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist3 provides detailed guidance to help authors in developing their systematic review questions relating to prediction models, extracting pertinent prediction model data, and appraising prediction model studies.3 Had these more relevant tools been used, and indeed, the review process outlined by the Cochrane Prognosis Methods Group followed4; it would have enabled the authors to better appraise and utilize the included prediction model studies in their review. In particular, it would have given more depth and clarity, and allowed enhanced identification of any strength in the existing evidence and also highlighted particular areas of conduct and reporting that should be improved upon in future studies. While the authors extracted and reported the discrimination performance (such as area under the curve) of models that were included, we note that there was no comment on model calibration— an essential component of model performance.4,5 Calibration is the agreement between probabilities derived from the model versus those actually observed within the data6 and is important in understanding the accuracy of the predictions from the model.7,8 This omission could have been addressed at the design stage using the aforementioned CHARMS checklist. Consequently, the authors have missed an important opportunity to report on this critical aspect of prediction model performance assessment and therefore presented readers with incomplete information on the usefulness of the included prediction models. Furthermore, any omission of calibration in the primary studies will have a direct and negative impact on the risk of bias assessment. A related concern is that the authors do not explain how they extracted performance estimates, and whether they used the extensive tools of Debray et al9 to help extract estimates (eg, the area under the curve and its confidence interval) when these were not reported directly, in order to maximize the information available for review. Whether performance statistics were adjusted for optimism was also not reported,10 and clinical utility measures (eg, net benefit11) were not discussed. We were also concerned with the authors’ expectations regarding the handling class imbalance using overor undersampling to create a more balanced data set. Data are said to be imbalanced when there are fewer individuals in the data set with the outcome (compared to those without the outcome). In the context of classification, this can indeed be a problem, for example, when evaluating classification accuracy (ie, proportion of correct classifications) in the sense that incorrectly misclassifying individuals with the outcome in a highly imbalanced data set could yield high accuracy— as the larger nonoutcome group will dominate the calculation of overall accuracy.12 However, in the context of prediction (the aim of the review by Seow et al1), class imbalance is a feature of the