Jeroen M Goedhart, Thomas Klausch, Jurriaan Janssen, Mark A van de Wiel
{"title":"基于经验贝叶斯的贝叶斯加性回归树协数据自适应使用。","authors":"Jeroen M Goedhart, Thomas Klausch, Jurriaan Janssen, Mark A van de Wiel","doi":"10.1002/sim.70004","DOIUrl":null,"url":null,"abstract":"<p><p>For clinical prediction applications, we are often faced with small sample size data compared to the number of covariates. Such data pose problems for variable selection and prediction, especially when the covariate-response relationship is complicated. To address these challenges, we propose to incorporate external information on the covariates into Bayesian additive regression trees (BART), a sum-of-trees prediction model that utilizes priors on the tree parameters to prevent overfitting. To incorporate external information, an empirical Bayes (EB) framework is developed that estimates, assisted by a model, prior covariate weights in the BART model. The proposed EB framework enables the estimation of the other prior parameters of BART as well, rendering an appealing and computationally efficient alternative to cross-validation. We show that the method finds relevant covariates and that it improves prediction compared to default BART in simulations. If the covariate-response relationship is non-linear, the method benefits from the flexibility of BART to outperform regression-based learners. Finally, the benefit of incorporating external information is shown in an application to diffuse large B-cell lymphoma prognosis based on clinical covariates, gene mutations, DNA translocations, and DNA copy number data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 5","pages":"e70004"},"PeriodicalIF":1.8000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834989/pdf/","citationCount":"0","resultStr":"{\"title\":\"Adaptive Use of Co-Data Through Empirical Bayes for Bayesian Additive Regression Trees.\",\"authors\":\"Jeroen M Goedhart, Thomas Klausch, Jurriaan Janssen, Mark A van de Wiel\",\"doi\":\"10.1002/sim.70004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>For clinical prediction applications, we are often faced with small sample size data compared to the number of covariates. Such data pose problems for variable selection and prediction, especially when the covariate-response relationship is complicated. To address these challenges, we propose to incorporate external information on the covariates into Bayesian additive regression trees (BART), a sum-of-trees prediction model that utilizes priors on the tree parameters to prevent overfitting. To incorporate external information, an empirical Bayes (EB) framework is developed that estimates, assisted by a model, prior covariate weights in the BART model. The proposed EB framework enables the estimation of the other prior parameters of BART as well, rendering an appealing and computationally efficient alternative to cross-validation. We show that the method finds relevant covariates and that it improves prediction compared to default BART in simulations. If the covariate-response relationship is non-linear, the method benefits from the flexibility of BART to outperform regression-based learners. Finally, the benefit of incorporating external information is shown in an application to diffuse large B-cell lymphoma prognosis based on clinical covariates, gene mutations, DNA translocations, and DNA copy number data.</p>\",\"PeriodicalId\":21879,\"journal\":{\"name\":\"Statistics in Medicine\",\"volume\":\"44 5\",\"pages\":\"e70004\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834989/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/sim.70004\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70004","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Adaptive Use of Co-Data Through Empirical Bayes for Bayesian Additive Regression Trees.
For clinical prediction applications, we are often faced with small sample size data compared to the number of covariates. Such data pose problems for variable selection and prediction, especially when the covariate-response relationship is complicated. To address these challenges, we propose to incorporate external information on the covariates into Bayesian additive regression trees (BART), a sum-of-trees prediction model that utilizes priors on the tree parameters to prevent overfitting. To incorporate external information, an empirical Bayes (EB) framework is developed that estimates, assisted by a model, prior covariate weights in the BART model. The proposed EB framework enables the estimation of the other prior parameters of BART as well, rendering an appealing and computationally efficient alternative to cross-validation. We show that the method finds relevant covariates and that it improves prediction compared to default BART in simulations. If the covariate-response relationship is non-linear, the method benefits from the flexibility of BART to outperform regression-based learners. Finally, the benefit of incorporating external information is shown in an application to diffuse large B-cell lymphoma prognosis based on clinical covariates, gene mutations, DNA translocations, and DNA copy number data.
期刊介绍:
The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.