{"title":"Variable selection for causal inference, prediction, and descriptive research: a narrative review of recommendations.","authors":"Brett P Dyer","doi":"10.1093/ehjopen/oeaf070","DOIUrl":null,"url":null,"abstract":"<p><p>There is a growing appreciation that the methods and analyses of medical studies should be tailored towards the type of research question. However, frequent conflation exists with respect to the reasons for statistically adjusting for variables in analyses and the methods that should be used for variable selection in regression models. Non-randomized causal studies require statistical adjustment for confounders that may bias the causal effect estimate. Predictor/prognostic factor studies may present unadjusted associations and/or present associations statistically adjusted for existing predictors to establish the added predictive value of the candidate predictor over and above known predictors. Prediction models aim to identify a set of variables that are clinically useable and are collectively the best at predicting the outcome. Descriptive studies may want to characterize the outcome distribution with respect to an additional variable or standardize with respect to a nuisance variable for which the study sample differs from the target population. This narrative review summarizes background theory and existing advice on how variable selection should differ for causal research, prediction modelling, predictor/prognostic factor research, and descriptive research. Examples of variable selection approaches from published cardiovascular research are also provided.</p>","PeriodicalId":93995,"journal":{"name":"European heart journal open","volume":"5 3","pages":"oeaf070"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204189/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European heart journal open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ehjopen/oeaf070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
There is a growing appreciation that the methods and analyses of medical studies should be tailored towards the type of research question. However, frequent conflation exists with respect to the reasons for statistically adjusting for variables in analyses and the methods that should be used for variable selection in regression models. Non-randomized causal studies require statistical adjustment for confounders that may bias the causal effect estimate. Predictor/prognostic factor studies may present unadjusted associations and/or present associations statistically adjusted for existing predictors to establish the added predictive value of the candidate predictor over and above known predictors. Prediction models aim to identify a set of variables that are clinically useable and are collectively the best at predicting the outcome. Descriptive studies may want to characterize the outcome distribution with respect to an additional variable or standardize with respect to a nuisance variable for which the study sample differs from the target population. This narrative review summarizes background theory and existing advice on how variable selection should differ for causal research, prediction modelling, predictor/prognostic factor research, and descriptive research. Examples of variable selection approaches from published cardiovascular research are also provided.