限制性三次样条回归的贪婪节点选择算法

J. Arnes, Alexander Hapfelmeier, Alexander Horsch, Tonje Braaten
{"title":"限制性三次样条回归的贪婪节点选择算法","authors":"J. Arnes, Alexander Hapfelmeier, Alexander Horsch, Tonje Braaten","doi":"10.3389/fepid.2023.1283705","DOIUrl":null,"url":null,"abstract":"Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.","PeriodicalId":73083,"journal":{"name":"Frontiers in epidemiology","volume":"16 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Greedy knot selection algorithm for restricted cubic spline regression\",\"authors\":\"J. Arnes, Alexander Hapfelmeier, Alexander Horsch, Tonje Braaten\",\"doi\":\"10.3389/fepid.2023.1283705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.\",\"PeriodicalId\":73083,\"journal\":{\"name\":\"Frontiers in epidemiology\",\"volume\":\"16 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in epidemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fepid.2023.1283705\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fepid.2023.1283705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

非线性回归模型在流行病学中很常见,用于预测或估计预测变量与反应变量之间的关系。受限三次样条曲线(RCS)回归就是这样一种方法,例如,它与 Cox 比例危险回归模型分析高度相关。RCS 回归使用在结点处连接的三阶多项式来模拟非线性关系。标准的方法是在外侧边界之间按定量的规则序列放置结点。使用相对较多的结点可以很容易地将回归曲线拟合到样本中。这样就会出现过度拟合的问题,即回归模型与给定样本拟合良好,但不能很好地推广到其他样本。因此,我们倾向于使用较少的结点数。然而,标准的结点选择过程可能会导致预测变量的稀疏区域表现不佳,尤其是在使用较少的结点数时。在密度较高的区域,它还可能导致过度拟合。我们介绍了一种简单的贪婪搜索算法,该算法使用了一种用于选择结点的后向方法,在模拟实验中,与标准结点选择过程相比,预测误差和贝叶斯信息标准得分都有所降低。我们已将该算法作为开源 R 软件包 knutar 的一部分加以实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Greedy knot selection algorithm for restricted cubic spline regression
Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信