Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America最新文献

筛选
英文 中文
MIP-BOOST: Efficient and Effective L 0 Feature Selection for Linear Regression. MIP-BOOST:高效和有效的线性回归l0特征选择。
IF 2.4
Ana Kenney, Francesca Chiaromonte, Giovanni Felici
{"title":"MIP-BOOST: Efficient and Effective <i>L</i> <sub>0</sub> Feature Selection for Linear Regression.","authors":"Ana Kenney,&nbsp;Francesca Chiaromonte,&nbsp;Giovanni Felici","doi":"10.1080/10618600.2020.1845184","DOIUrl":"https://doi.org/10.1080/10618600.2020.1845184","url":null,"abstract":"<p><p>Recent advances in mathematical programming have made Mixed Integer Optimization a competitive alternative to popular regularization methods for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. Here we propose MIP-BOOST, a revision of standard Mixed Integer Programming feature selection that reduces the computational burden of tuning the critical sparsity bound parameter and improves performance in the presence of feature collinearity and of signals that vary in nature and strength. The final outcome is a more efficient and effective <i>L</i> <sub>0</sub> Feature Selection method for applications of realistic size and complexity, grounded on rigorous cross-validation tuning and exact optimization of the associated Mixed Integer Program. Computational viability and improved performance in realistic scenarios is achieved through three independent but synergistic proposals. Supplementary materials including additional results, pseudocode, and computer code are available online.</p>","PeriodicalId":520666,"journal":{"name":"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America","volume":" ","pages":"566-577"},"PeriodicalIF":2.4,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10618600.2020.1845184","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40503667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Adaptive Bayesian Spectral Analysis of High-dimensional Nonstationary Time Series. 高维非平稳时间序列的自适应贝叶斯谱分析。
IF 2.4
Zeda Li, Ori Rosen, Fabio Ferrarelli, Robert T Krafty
{"title":"Adaptive Bayesian Spectral Analysis of High-dimensional Nonstationary Time Series.","authors":"Zeda Li,&nbsp;Ori Rosen,&nbsp;Fabio Ferrarelli,&nbsp;Robert T Krafty","doi":"10.1080/10618600.2020.1868305","DOIUrl":"https://doi.org/10.1080/10618600.2020.1868305","url":null,"abstract":"<p><p>This article introduces a nonparametric approach to spectral analysis of a high-dimensional multivariate nonstationary time series. The procedure is based on a novel frequency-domain factor model that provides a flexible yet parsimonious representation of spectral matrices from a large number of simultaneously observed time series. Real and imaginary parts of the factor loading matrices are modeled independently using a prior that is formulated from the tensor product of penalized splines and multiplicative gamma process shrinkage priors, allowing for infinitely many factors with loadings increasingly shrunk towards zero as the column index increases. Formulated in a fully Bayesian framework, the time series is adaptively partitioned into approximately stationary segments, where both the number and locations of partition points are assumed unknown. Stochastic approximation Monte Carlo (SAMC) techniques are used to accommodate the unknown number of segments, and a conditional Whittle likelihood-based Gibbs sampler is developed for efficient sampling within segments. By averaging over the distribution of partitions, the proposed method can approximate both abrupt and slowly varying changes in spectral matrices. Performance of the proposed model is evaluated by extensive simulations and demonstrated through the analysis of high-density electroencephalography.</p>","PeriodicalId":520666,"journal":{"name":"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America","volume":" ","pages":"794-807"},"PeriodicalIF":2.4,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10618600.2020.1868305","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40687573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scalable Algorithms for Large Competing Risks Data. 大型竞争风险数据的可扩展算法。
IF 2.4
Eric S Kawaguchi, Jenny I Shen, Marc A Suchard, Gang Li
{"title":"Scalable Algorithms for Large Competing Risks Data.","authors":"Eric S Kawaguchi,&nbsp;Jenny I Shen,&nbsp;Marc A Suchard,&nbsp;Gang Li","doi":"10.1080/10618600.2020.1841650","DOIUrl":"https://doi.org/10.1080/10618600.2020.1841650","url":null,"abstract":"<p><p>This paper develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate <i>ℓ</i> <sub>0</sub>-based iteratively reweighted <i>ℓ</i> <sub>2</sub>-penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic update of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted <i>ℓ</i> <sub>2</sub>-penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the computation costs of the log-pseudo likelihood and its derivatives for PSH model grow at the rate of <i>O</i>(<i>n</i> <sup>2</sup>) with the sample size <i>n</i> in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to <i>O</i>(<i>n</i>). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yields > 1, 000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and a United States Renal Data System data. Supplementary materials for this article are available online.</p>","PeriodicalId":520666,"journal":{"name":"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America","volume":" ","pages":"685-693"},"PeriodicalIF":2.4,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10618600.2020.1841650","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40721719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
GPU-powered Shotgun Stochastic Search for Dirichlet process mixtures of Gaussian Graphical Models. 基于gpu的散弹枪随机搜索高斯图形模型的Dirichlet过程混合。
IF 2.4
Chiranjit Mukherjee, Abel Rodriguez
{"title":"GPU-powered Shotgun Stochastic Search for Dirichlet process mixtures of Gaussian Graphical Models.","authors":"Chiranjit Mukherjee,&nbsp;Abel Rodriguez","doi":"10.1080/10618600.2015.1037883","DOIUrl":"https://doi.org/10.1080/10618600.2015.1037883","url":null,"abstract":"<p><p>Gaussian graphical models are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of Gaussian graphical models extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous sub-groups. In this paper we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable Gaussian graphical models. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation. The computational advantages of our algorithms are demonstrated with various simulated data examples in which we compare our stochastic search with a Markov chain Monte Carlo algorithm in moderate dimensional data examples. These experiments show that our stochastic search largely outperforms the Markov chain Monte Carlo algorithm in terms of computing-times and in terms of the quality of the posterior mode discovered. Finally, we analyze a gene expression dataset in which Markov chain Monte Carlo algorithms are too slow to be practically useful.</p>","PeriodicalId":520666,"journal":{"name":"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America","volume":" ","pages":"762-788"},"PeriodicalIF":2.4,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10618600.2015.1037883","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35097915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fast Nonparametric Density-Based Clustering of Large Data Sets Using a Stochastic Approximation Mean-Shift Algorithm. 基于随机逼近Mean-Shift算法的大型数据集快速非参数密度聚类。
IF 2.4
Ollivier Hyrien, Andrea Baran
{"title":"Fast Nonparametric Density-Based Clustering of Large Data Sets Using a Stochastic Approximation Mean-Shift Algorithm.","authors":"Ollivier Hyrien,&nbsp;Andrea Baran","doi":"10.1080/10618600.2015.1051625","DOIUrl":"https://doi.org/10.1080/10618600.2015.1051625","url":null,"abstract":"<p><p>Mean-shift is an iterative procedure often used as a nonparametric clustering algorithm that defines clusters based on the modal regions of a density function. The algorithm is conceptually appealing and makes assumptions neither about the shape of the clusters nor about their number. However, with a complexity of <i>O</i>(<i>n</i><sup>2</sup>) per iteration, it does not scale well to large data sets. We propose a novel algorithm which performs density-based clustering much quicker than mean-shift, yet delivering virtually identical results. This algorithm combines subsampling and a stochastic approximation procedure to achieve a potential complexity of <i>O</i>(<i>n</i>) at each step. Its convergence is established. Its performances are evaluated using simulations and applications to image segmentation, where the algorithm was tens or hundreds of times faster than mean-shift, yet causing negligible amounts of clustering errors. The algorithm can be combined with existing approaches to further accelerate clustering.</p>","PeriodicalId":520666,"journal":{"name":"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America","volume":" ","pages":"899-916"},"PeriodicalIF":2.4,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10618600.2015.1051625","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34975417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Adaptive Mixture Modelling Metropolis Methods for Bayesian Analysis of Non-linear State-Space Models. 非线性状态空间模型贝叶斯分析的自适应混合建模Metropolis方法。
Jarad Niemi, Mike West
{"title":"Adaptive Mixture Modelling Metropolis Methods for Bayesian Analysis of Non-linear State-Space Models.","authors":"Jarad Niemi, Mike West","doi":"10.1198/jcgs.2010.08117","DOIUrl":"10.1198/jcgs.2010.08117","url":null,"abstract":"<p><p>We describe a strategy for Markov chain Monte Carlo analysis of non-linear, non-Gaussian state-space models involving batch analysis for inference on dynamic, latent state variables and fixed model parameters. The key innovation is a Metropolis-Hastings method for the time series of state variables based on sequential approximation of filtering and smoothing densities using normal mixtures. These mixtures are propagated through the non-linearities using an accurate, local mixture approximation method, and we use a regenerating procedure to deal with potential degeneracy of mixture components. This provides accurate, direct approximations to sequential filtering and retrospective smoothing distributions, and hence a useful construction of global Metropolis proposal distributions for simulation of posteriors for the set of states. This analysis is embedded within a Gibbs sampler to include uncertain fixed parameters. We give an example motivated by an application in systems biology. Supplemental materials provide an example based on a stochastic volatility model as well as MATLAB code.</p>","PeriodicalId":520666,"journal":{"name":"Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America","volume":" ","pages":"260-280"},"PeriodicalIF":0.0,"publicationDate":"2010-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887612/pdf/nihms190399.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29069758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信