Dealing with continuous variables and modelling non-linear associations in healthcare data: practical guide

The BMJ Pub Date : 2025-07-16 DOI:10.1136/bmj-2024-082440
Pedro Lopez-Ayala, Richard D Riley, Gary S Collins, Tobias Zimmermann
{"title":"Dealing with continuous variables and modelling non-linear associations in healthcare data: practical guide","authors":"Pedro Lopez-Ayala, Richard D Riley, Gary S Collins, Tobias Zimmermann","doi":"10.1136/bmj-2024-082440","DOIUrl":null,"url":null,"abstract":"Proper handling of continuous variables is crucial in healthcare research, for example, within regression modelling for descriptive, explanatory, or predictive purposes. However, inadequate methods are commonly used. This article highlights the importance of appropriately handling continuous variables, and illustrates the consequences of categorisation. This article also explains why assuming a linear relationship between the independent and dependent variable might be inappropriate, and describes how to use splines or fractional polynomials to model non-linear relationships. Continuous variables such as age, vital parameters, or biomarker concentrations are abundant in healthcare research. Whether the research aim is to describe (eg, whether age is associated with six month mortality after a diagnosis of covid-19), explain (eg, does the effect of a new cancer drug vary according to the value of a continuous biomarker), or predict (eg, does adding blood pressure to the model improve the prediction accuracy of risk for cardiovascular disease),1 researchers should appropriately model the association between independent and dependent variables. Researchers frequently encounter this challenge, for example, when fitting a regression model. But too often, the approaches used are inadequate, including submissions to The BMJ .2 In this article, we provide an overview of the current state of handling continuous variables in healthcare research. We discuss the drawbacks of categorising a continuous variable, and the potential limitations of assuming a linear relationship between independent and dependent variables. We discuss existing reviews of current practice and then outline two recommended approaches that allow for non-linear relationships: fractional polynomials345 and splines,678 with a particular focus on restricted cubic splines. Box 1 provides a list of key terms, and the key messages are illustrated throughout using the publicly available acute bacterial meningitis dataset,9 where we examine the association between levels of glucose in cerebrospinal fluid …","PeriodicalId":22388,"journal":{"name":"The BMJ","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The BMJ","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmj-2024-082440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Proper handling of continuous variables is crucial in healthcare research, for example, within regression modelling for descriptive, explanatory, or predictive purposes. However, inadequate methods are commonly used. This article highlights the importance of appropriately handling continuous variables, and illustrates the consequences of categorisation. This article also explains why assuming a linear relationship between the independent and dependent variable might be inappropriate, and describes how to use splines or fractional polynomials to model non-linear relationships. Continuous variables such as age, vital parameters, or biomarker concentrations are abundant in healthcare research. Whether the research aim is to describe (eg, whether age is associated with six month mortality after a diagnosis of covid-19), explain (eg, does the effect of a new cancer drug vary according to the value of a continuous biomarker), or predict (eg, does adding blood pressure to the model improve the prediction accuracy of risk for cardiovascular disease),1 researchers should appropriately model the association between independent and dependent variables. Researchers frequently encounter this challenge, for example, when fitting a regression model. But too often, the approaches used are inadequate, including submissions to The BMJ .2 In this article, we provide an overview of the current state of handling continuous variables in healthcare research. We discuss the drawbacks of categorising a continuous variable, and the potential limitations of assuming a linear relationship between independent and dependent variables. We discuss existing reviews of current practice and then outline two recommended approaches that allow for non-linear relationships: fractional polynomials345 and splines,678 with a particular focus on restricted cubic splines. Box 1 provides a list of key terms, and the key messages are illustrated throughout using the publicly available acute bacterial meningitis dataset,9 where we examine the association between levels of glucose in cerebrospinal fluid …
在医疗保健数据中处理连续变量和建模非线性关联:实用指南
正确处理连续变量在医疗保健研究中是至关重要的,例如,在用于描述、解释或预测目的的回归建模中。然而,通常使用的方法不充分。本文强调了适当处理连续变量的重要性,并说明了分类的后果。本文还解释了为什么假设自变量和因变量之间的线性关系可能是不合适的,并描述了如何使用样条或分数阶多项式来模拟非线性关系。连续变量,如年龄,重要参数,或生物标志物浓度在医疗保健研究中是丰富的。无论研究目的是描述(例如,年龄是否与covid-19诊断后的6个月死亡率相关),解释(例如,新的抗癌药物的效果是否根据连续生物标志物的值而变化),还是预测(例如,在模型中加入血压是否提高了心血管疾病风险的预测准确性),1研究人员都应该适当地建立自变量和因变量之间的关联模型。研究人员经常遇到这种挑战,例如,在拟合回归模型时。但是,经常使用的方法是不充分的,包括提交给BMJ .2在这篇文章中,我们概述了在医疗保健研究中处理连续变量的现状。我们讨论了对连续变量进行分类的缺点,以及假设自变量和因变量之间存在线性关系的潜在限制。我们讨论了当前实践的现有评论,然后概述了两种推荐的方法,允许非线性关系:分数多项式和样条,678特别关注限制三次样条。框1提供了关键术语列表,关键信息通过使用公开的急性细菌性脑膜炎数据集9进行说明,其中我们检查了脑脊液中葡萄糖水平与…
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信