Pedro Lopez-Ayala, Richard D Riley, Gary S Collins, Tobias Zimmermann
{"title":"Dealing with continuous variables and modelling non-linear associations in healthcare data: practical guide","authors":"Pedro Lopez-Ayala, Richard D Riley, Gary S Collins, Tobias Zimmermann","doi":"10.1136/bmj-2024-082440","DOIUrl":null,"url":null,"abstract":"Proper handling of continuous variables is crucial in healthcare research, for example, within regression modelling for descriptive, explanatory, or predictive purposes. However, inadequate methods are commonly used. This article highlights the importance of appropriately handling continuous variables, and illustrates the consequences of categorisation. This article also explains why assuming a linear relationship between the independent and dependent variable might be inappropriate, and describes how to use splines or fractional polynomials to model non-linear relationships. Continuous variables such as age, vital parameters, or biomarker concentrations are abundant in healthcare research. Whether the research aim is to describe (eg, whether age is associated with six month mortality after a diagnosis of covid-19), explain (eg, does the effect of a new cancer drug vary according to the value of a continuous biomarker), or predict (eg, does adding blood pressure to the model improve the prediction accuracy of risk for cardiovascular disease),1 researchers should appropriately model the association between independent and dependent variables. Researchers frequently encounter this challenge, for example, when fitting a regression model. But too often, the approaches used are inadequate, including submissions to The BMJ .2 In this article, we provide an overview of the current state of handling continuous variables in healthcare research. We discuss the drawbacks of categorising a continuous variable, and the potential limitations of assuming a linear relationship between independent and dependent variables. We discuss existing reviews of current practice and then outline two recommended approaches that allow for non-linear relationships: fractional polynomials345 and splines,678 with a particular focus on restricted cubic splines. Box 1 provides a list of key terms, and the key messages are illustrated throughout using the publicly available acute bacterial meningitis dataset,9 where we examine the association between levels of glucose in cerebrospinal fluid …","PeriodicalId":22388,"journal":{"name":"The BMJ","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The BMJ","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmj-2024-082440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Proper handling of continuous variables is crucial in healthcare research, for example, within regression modelling for descriptive, explanatory, or predictive purposes. However, inadequate methods are commonly used. This article highlights the importance of appropriately handling continuous variables, and illustrates the consequences of categorisation. This article also explains why assuming a linear relationship between the independent and dependent variable might be inappropriate, and describes how to use splines or fractional polynomials to model non-linear relationships. Continuous variables such as age, vital parameters, or biomarker concentrations are abundant in healthcare research. Whether the research aim is to describe (eg, whether age is associated with six month mortality after a diagnosis of covid-19), explain (eg, does the effect of a new cancer drug vary according to the value of a continuous biomarker), or predict (eg, does adding blood pressure to the model improve the prediction accuracy of risk for cardiovascular disease),1 researchers should appropriately model the association between independent and dependent variables. Researchers frequently encounter this challenge, for example, when fitting a regression model. But too often, the approaches used are inadequate, including submissions to The BMJ .2 In this article, we provide an overview of the current state of handling continuous variables in healthcare research. We discuss the drawbacks of categorising a continuous variable, and the potential limitations of assuming a linear relationship between independent and dependent variables. We discuss existing reviews of current practice and then outline two recommended approaches that allow for non-linear relationships: fractional polynomials345 and splines,678 with a particular focus on restricted cubic splines. Box 1 provides a list of key terms, and the key messages are illustrated throughout using the publicly available acute bacterial meningitis dataset,9 where we examine the association between levels of glucose in cerebrospinal fluid …