Bayesian Variable Selection in Semiparametric Proportional Hazards Model for High Dimensional Survival Data

IF 1.2 4区 数学
Kyu Ha Lee, S. Chakraborty, Jianguo Sun
{"title":"Bayesian Variable Selection in Semiparametric Proportional Hazards Model for High Dimensional Survival Data","authors":"Kyu Ha Lee, S. Chakraborty, Jianguo Sun","doi":"10.2202/1557-4679.1301","DOIUrl":null,"url":null,"abstract":"Variable selection for high dimensional data has recently received a great deal of attention. However, due to the complex structure of the likelihood, only limited developments have been made for time-to-event data where censoring is present. In this paper, we propose a Bayesian variable selection scheme for a Bayesian semiparametric survival model for right censored survival data sets. A special shrinkage prior on the coefficients corresponding to the predictor variables is used to handle cases when the explanatory variables are of very high-dimension. The shrinkage prior is obtained through a scale mixture representation of Normal and Gamma distributions. Our proposed variable selection prior corresponds to the well known lasso penalty. The likelihood function is based on the Cox proportional hazards model framework, where the cumulative baseline hazard function is modeled a priori by a gamma process. We assign a prior on the tuning parameter of the shrinkage prior and adaptively control the sparsity of our model. The primary use of the proposed model is to identify the important covariates relating to the survival curves. To implement our methodology, we have developed a fast Markov chain Monte Carlo algorithm with an adaptive jumping rule. We have successfully applied our method on simulated data sets under two different settings and real microarray data sets which contain right censored survival time. The performance of our Bayesian variable selection model compared with other competing methods is also provided to demonstrate the superiority of our method. A short description of the biological relevance of the selected genes in the real data sets is provided, further strengthening our claims.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2011-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1301","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.2202/1557-4679.1301","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

Variable selection for high dimensional data has recently received a great deal of attention. However, due to the complex structure of the likelihood, only limited developments have been made for time-to-event data where censoring is present. In this paper, we propose a Bayesian variable selection scheme for a Bayesian semiparametric survival model for right censored survival data sets. A special shrinkage prior on the coefficients corresponding to the predictor variables is used to handle cases when the explanatory variables are of very high-dimension. The shrinkage prior is obtained through a scale mixture representation of Normal and Gamma distributions. Our proposed variable selection prior corresponds to the well known lasso penalty. The likelihood function is based on the Cox proportional hazards model framework, where the cumulative baseline hazard function is modeled a priori by a gamma process. We assign a prior on the tuning parameter of the shrinkage prior and adaptively control the sparsity of our model. The primary use of the proposed model is to identify the important covariates relating to the survival curves. To implement our methodology, we have developed a fast Markov chain Monte Carlo algorithm with an adaptive jumping rule. We have successfully applied our method on simulated data sets under two different settings and real microarray data sets which contain right censored survival time. The performance of our Bayesian variable selection model compared with other competing methods is also provided to demonstrate the superiority of our method. A short description of the biological relevance of the selected genes in the real data sets is provided, further strengthening our claims.
高维生存数据半参数比例风险模型中的贝叶斯变量选择
高维数据的变量选择问题近年来受到了广泛的关注。然而,由于可能性的复杂结构,对于存在审查的事件时间数据,只进行了有限的开发。本文针对右截尾生存数据集的贝叶斯半参数生存模型,提出了一个贝叶斯变量选择方案。当解释变量具有非常高的维度时,对预测变量对应的系数使用特殊的先验收缩来处理。收缩先验是通过正态分布和伽玛分布的比例混合表示获得的。我们提出的变量选择先验对应于众所周知的套索惩罚。似然函数基于Cox比例风险模型框架,其中累积基线风险函数通过gamma过程先验建模。我们对收缩先验的调整参数赋予一个先验,并自适应地控制模型的稀疏度。该模型的主要用途是识别与生存曲线相关的重要协变量。为了实现我们的方法,我们开发了一个具有自适应跳跃规则的快速马尔可夫链蒙特卡罗算法。我们成功地将我们的方法应用于两种不同设置下的模拟数据集和包含正确截短存活时间的真实微阵列数据集。最后,将贝叶斯变量选择模型的性能与其他竞争方法进行了比较,证明了该方法的优越性。在真实的数据集中提供了所选基因的生物学相关性的简短描述,进一步加强了我们的主张。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Biostatistics
International Journal of Biostatistics Mathematics-Statistics and Probability
CiteScore
2.30
自引率
8.30%
发文量
28
期刊介绍: The International Journal of Biostatistics (IJB) seeks to publish new biostatistical models and methods, new statistical theory, as well as original applications of statistical methods, for important practical problems arising from the biological, medical, public health, and agricultural sciences with an emphasis on semiparametric methods. Given many alternatives to publish exist within biostatistics, IJB offers a place to publish for research in biostatistics focusing on modern methods, often based on machine-learning and other data-adaptive methodologies, as well as providing a unique reading experience that compels the author to be explicit about the statistical inference problem addressed by the paper. IJB is intended that the journal cover the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework. Electronic publication also allows for data and software code to be appended, and opens the door for reproducible research allowing readers to easily replicate analyses described in a paper. Both original research and review articles will be warmly received, as will articles applying sound statistical methods to practical problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信