Adaptive Use of Co-Data Through Empirical Bayes for Bayesian Additive Regression Trees.

IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Jeroen M Goedhart, Thomas Klausch, Jurriaan Janssen, Mark A van de Wiel
{"title":"Adaptive Use of Co-Data Through Empirical Bayes for Bayesian Additive Regression Trees.","authors":"Jeroen M Goedhart, Thomas Klausch, Jurriaan Janssen, Mark A van de Wiel","doi":"10.1002/sim.70004","DOIUrl":null,"url":null,"abstract":"<p><p>For clinical prediction applications, we are often faced with small sample size data compared to the number of covariates. Such data pose problems for variable selection and prediction, especially when the covariate-response relationship is complicated. To address these challenges, we propose to incorporate external information on the covariates into Bayesian additive regression trees (BART), a sum-of-trees prediction model that utilizes priors on the tree parameters to prevent overfitting. To incorporate external information, an empirical Bayes (EB) framework is developed that estimates, assisted by a model, prior covariate weights in the BART model. The proposed EB framework enables the estimation of the other prior parameters of BART as well, rendering an appealing and computationally efficient alternative to cross-validation. We show that the method finds relevant covariates and that it improves prediction compared to default BART in simulations. If the covariate-response relationship is non-linear, the method benefits from the flexibility of BART to outperform regression-based learners. Finally, the benefit of incorporating external information is shown in an application to diffuse large B-cell lymphoma prognosis based on clinical covariates, gene mutations, DNA translocations, and DNA copy number data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 5","pages":"e70004"},"PeriodicalIF":1.8000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834989/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70004","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

For clinical prediction applications, we are often faced with small sample size data compared to the number of covariates. Such data pose problems for variable selection and prediction, especially when the covariate-response relationship is complicated. To address these challenges, we propose to incorporate external information on the covariates into Bayesian additive regression trees (BART), a sum-of-trees prediction model that utilizes priors on the tree parameters to prevent overfitting. To incorporate external information, an empirical Bayes (EB) framework is developed that estimates, assisted by a model, prior covariate weights in the BART model. The proposed EB framework enables the estimation of the other prior parameters of BART as well, rendering an appealing and computationally efficient alternative to cross-validation. We show that the method finds relevant covariates and that it improves prediction compared to default BART in simulations. If the covariate-response relationship is non-linear, the method benefits from the flexibility of BART to outperform regression-based learners. Finally, the benefit of incorporating external information is shown in an application to diffuse large B-cell lymphoma prognosis based on clinical covariates, gene mutations, DNA translocations, and DNA copy number data.

基于经验贝叶斯的贝叶斯加性回归树协数据自适应使用。
在临床预测应用中,我们经常面临与协变量数量相比样本量较小的数据。这些数据给变量选择和预测带来了问题,特别是当协变量-响应关系复杂时。为了解决这些挑战,我们建议将协变量的外部信息纳入贝叶斯加性回归树(BART),这是一种利用树参数先验来防止过拟合的树和预测模型。为了整合外部信息,开发了一个经验贝叶斯(EB)框架,在模型的辅助下,估计BART模型中的先验协变量权重。提出的EB框架也可以估计BART的其他先验参数,提供了一种有吸引力且计算效率高的交叉验证替代方案。我们表明,该方法找到了相关的协变量,并且与模拟中的默认BART相比,它提高了预测。如果协变量-响应关系是非线性的,则该方法受益于BART的灵活性,优于基于回归的学习器。最后,基于临床协变量、基因突变、DNA易位和DNA拷贝数数据的弥漫性大b细胞淋巴瘤预后的应用显示了纳入外部信息的好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistics in Medicine
Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生
CiteScore
3.40
自引率
10.00%
发文量
334
审稿时长
2-4 weeks
期刊介绍: The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信