{"title":"Bayesian Variable Selection in Semiparametric Proportional Hazards Model for High Dimensional Survival Data","authors":"Kyu Ha Lee, S. Chakraborty, Jianguo Sun","doi":"10.2202/1557-4679.1301","DOIUrl":"https://doi.org/10.2202/1557-4679.1301","url":null,"abstract":"Variable selection for high dimensional data has recently received a great deal of attention. However, due to the complex structure of the likelihood, only limited developments have been made for time-to-event data where censoring is present. In this paper, we propose a Bayesian variable selection scheme for a Bayesian semiparametric survival model for right censored survival data sets. A special shrinkage prior on the coefficients corresponding to the predictor variables is used to handle cases when the explanatory variables are of very high-dimension. The shrinkage prior is obtained through a scale mixture representation of Normal and Gamma distributions. Our proposed variable selection prior corresponds to the well known lasso penalty. The likelihood function is based on the Cox proportional hazards model framework, where the cumulative baseline hazard function is modeled a priori by a gamma process. We assign a prior on the tuning parameter of the shrinkage prior and adaptively control the sparsity of our model. The primary use of the proposed model is to identify the important covariates relating to the survival curves. To implement our methodology, we have developed a fast Markov chain Monte Carlo algorithm with an adaptive jumping rule. We have successfully applied our method on simulated data sets under two different settings and real microarray data sets which contain right censored survival time. The performance of our Bayesian variable selection model compared with other competing methods is also provided to demonstrate the superiority of our method. A short description of the biological relevance of the selected genes in the real data sets is provided, further strengthening our claims.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1301","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Alternative to Pooling Kaplan-Meier Curves in Time-to-Event Meta-Analysis","authors":"D. Rubin","doi":"10.2202/1557-4679.1289","DOIUrl":"https://doi.org/10.2202/1557-4679.1289","url":null,"abstract":"A meta-analysis that uses individual-level data instead of study-level data is widely considered to be a gold standard approach, in part because it allows a time-to-event analysis. Unfortunately, with the common practice of presenting Kaplan-Meier survival curves after pooling subjects across randomized trials, using individual-level data can actually be a step backwards; a Simpson's paradox can occur in which pooling incorrectly reverses the direction of an association. We introduce a nonparametric procedure for synthesizing survival curves across studies that is designed to avoid this difficulty and preserve the integrity of randomization. The technique is based on a counterfactual formulation in which we ask what pooled survival curves would look like if all subjects in all studies had been assigned treatment, or if all subjects had been assigned to control arms. The method is related to a Kaplan-Meier adjustment proposed in 2005 by Xie and Liu to correct for confounding in nonrandomized studies, but is formulated for the meta-analysis setting. The procedure is discussed in the context of examining rosiglitazone and cardiovascular adverse events.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1289","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Marginal Models for Censored Longitudinal Cost Data: Appropriate Working Variance Matrices in Inverse-Probability-Weighted GEEs Can Improve Precision","authors":"E. Pullenayegum, A. Willan","doi":"10.2202/1557-4679.1170","DOIUrl":"https://doi.org/10.2202/1557-4679.1170","url":null,"abstract":"When cost data are collected in a clinical study, interest centers on the between-treatment difference in mean cost. When censoring is present, the resulting loss of information can be limited by collecting cost data for several pre-specified time intervals, leading to censored longitudinal cost data. Most models for marginal costs stratify by time interval. However, in few other areas of biostatistics would we stratify by default. We argue that there are benefits to considering more general models: for example, in some settings, pooling regression coefficients across intervals can improve the precision of the estimated between-treatment difference in mean cost. Previous work has used inverse-probability-weighted GEEs coupled with an independent working variance to estimate parameters from these more general models. We show that the greatest precision benefits of non-stratified models are achieved by using more sophisticated working variance matrices.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1170","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68716427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HingeBoost: ROC-Based Boost for Classification and Variable Selection","authors":"Zhuo Wang","doi":"10.2202/1557-4679.1304","DOIUrl":"https://doi.org/10.2202/1557-4679.1304","url":null,"abstract":"In disease classification, a traditional technique is the receiver operative characteristic (ROC) curve and the area under the curve (AUC). With high-dimensional data, the ROC techniques are needed to conduct classification and variable selection. The current ROC methods do not explicitly incorporate unequal misclassification costs or do not have a theoretical grounding for optimizing the AUC. Empirical studies in the literature have demonstrated that optimizing the hinge loss can maximize the AUC approximately. In theory, minimizing the hinge rank loss is equivalent to minimizing the AUC in the asymptotic limit. In this article, we propose a novel nonparametric method HingeBoost to optimize a weighted hinge loss incorporating misclassification costs. HingeBoost can be used to construct linear and nonlinear classifiers. The estimation and variable selection for the hinge loss are addressed by a new boosting algorithm. Furthermore, the proposed twin HingeBoost can select more sparse predictors. Some properties of HingeBoost are studied as well. To compare HingeBoost with existing classification methods, we present empirical study results using data from simulations and a prostate cancer study with mass spectrometry-based proteomics.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1304","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Targeting the Optimal Design in Randomized Clinical Trials with Binary Outcomes and No Covariate: Simulation Study","authors":"A. Chambaz, M. J. van der Laan","doi":"10.2202/1557-4679.1310","DOIUrl":"https://doi.org/10.2202/1557-4679.1310","url":null,"abstract":"We undertake here a comprehensive simulation study of the theoretical properties that we derive in a companion article devoted to the asymptotic study of adaptive group sequential designs in the case of randomized clinical trials (RCTs) with binary treatment, binary outcome and no covariate. By adaptive design, we mean in this setting a RCT design that allows the investigator to dynamically modify its course through data-driven adjustment of the randomization probability based on data accrued so far without negatively impacting on the statistical integrity of the trial. By adaptive group sequential design, we refer to the fact that group sequential testing methods can be equally well applied on top of adaptive designs. The simulation study validates the theory. It notably shows in the estimation framework that the confidence intervals we obtain achieve the desired coverage even for moderate sample sizes. In addition, it shows in the testing framework that type I error control at the prescribed level is guaranteed and that all sampling procedures only suffer from a very slight increase of the type II error. A three-sentence take-home message is “Adaptive designs do learn the targeted optimal design and inference and testing can be carried out under adaptive sampling as they would under the targeted optimal randomization probability iid sampling. In particular, adaptive designs achieve the same efficiency as the fixed oracle design. This is confirmed by a simulation study, at least for moderate or large sample sizes, across a large collection of targeted randomization probabilities.”","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1310","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68718288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relative Risk Estimation in Randomized Controlled Trials: A Comparison of Methods for Independent Observations","authors":"L. Yelland, A. Salter, Philip Ryan","doi":"10.2202/1557-4679.1278","DOIUrl":"https://doi.org/10.2202/1557-4679.1278","url":null,"abstract":"The relative risk is a clinically important measure of the effect of treatment on binary outcomes in randomized controlled trials (RCTs). An adjusted relative risk can be estimated using log binomial regression; however, convergence problems are common with this model. While alternative methods have been proposed for estimating relative risks, comparisons between methods have been limited, particularly in the context of RCTs. We compare ten different methods for estimating relative risks under a variety of scenarios relevant to RCTs with independent observations. Results of a large simulation study show that some methods may fail to overcome the convergence problems of log binomial regression, while others may substantially overestimate the treatment effect or produce inaccurate confidence intervals. Further, conclusions about the effectiveness of treatment may differ depending on the method used. We give recommendations for choosing a method for estimating relative risks in the context of RCTs with independent observations.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Saavedra, A. Santana-del-Pino, C. N. Hernández-Flores, J. Artiles-Romero, J. J. González-Henríquez
{"title":"Classification of Stationary Signals with Mixed Spectrum","authors":"P. Saavedra, A. Santana-del-Pino, C. N. Hernández-Flores, J. Artiles-Romero, J. J. González-Henríquez","doi":"10.2202/1557-4679.1288","DOIUrl":"https://doi.org/10.2202/1557-4679.1288","url":null,"abstract":"This paper deals with the problem of discrimination between two sets of complex signals generated by stationary processes with both random effects and mixed spectral distributions. The presence of outlier signals and their influence on the classification process is also considered. As an initial input, a feature vector obtained from estimations of the spectral distribution is proposed and used with two different learning machines, namely a single artificial neural network and the LogitBoost classifier. Performance of both methods is evaluated on five simulation studies as well as on a set of actual data of electroencephalogram (EEG) records obtained from both normal subjects and others having experienced epileptic seizures. Of the different classification methods, Logitboost is shown to be more robust to the presence of outlier signals.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1288","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Bland-Altman Method for Concordance Assessment","authors":"Jason J. Z. Liao, R. Capen","doi":"10.2202/1557-4679.1295","DOIUrl":"https://doi.org/10.2202/1557-4679.1295","url":null,"abstract":"It is often necessary to compare two measurement methods in medicine and other experimental sciences. This problem covers a broad range of data with applications arising from many different fields. The Bland-Altman method has been a favorite method for concordance assessment. However, the Bland-Altman approach creates a problem of interpretation for many applications when a mixture of fixed bias, proportional bias and/or proportional error occurs. In this paper, an improved Bland-Altman method is proposed to handle more complicated scenarios in practice. This new approach includes Bland-Altman's approach as its special case. We evaluate concordance by defining an agreement interval for each individual paired observation and assessing the overall concordance. The proposed interval approach is very informative and offers many advantages over existing approaches. Data sets are used to demonstrate the advantages of the new method.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dunnett-Type Procedure for Multiple Endpoints","authors":"M. Hasler, L. Hothorn","doi":"10.2202/1557-4679.1258","DOIUrl":"https://doi.org/10.2202/1557-4679.1258","url":null,"abstract":"This paper describes a method for comparisons of several treatments with a control, simultaneously for multiple endpoints. These endpoints are assumed to be normally distributed with different scales and variances. An approximate multivariate t-distribution is used to obtain quantiles for test decisions, multiplicity-adjusted p-values, and simultaneous confidence intervals. Simulation results show that this approach controls the family-wise error type I over both the comparisons and the endpoints in an admissible range. The approach will be applied to a randomized clinical trial comparing two new sets of extracorporeal circulations with a standard for three primary endpoints. A related R package is available.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1258","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rejoinder to Nancy Cook's Comment on \"Measures to Summarize and Compare the Predictive Capacity of Markers\"","authors":"M. Pepe","doi":"10.2202/1557-4679.1280","DOIUrl":"https://doi.org/10.2202/1557-4679.1280","url":null,"abstract":"This is a response to Nancy Cook's Readers' Reaction to \"Measures to Summarize and Compare the Predictive Capacity of Markers.\"","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"6 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2010-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1280","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68718007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}