{"title":"Bayesian variable selection for matrix autoregressive models","authors":"Alessandro Celani, Paolo Pagnottoni, Galin Jones","doi":"10.1007/s11222-024-10402-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10402-y","url":null,"abstract":"<p>A Bayesian method is proposed for variable selection in high-dimensional matrix autoregressive models which reflects and exploits the original matrix structure of data to (a) reduce dimensionality and (b) foster interpretability of multidimensional relationship structures. A compact form of the model is derived which facilitates the estimation procedure and two computational methods for the estimation are proposed: a Markov chain Monte Carlo algorithm and a scalable Bayesian EM algorithm. Being based on the spike-and-slab framework for fast posterior mode identification, the latter enables Bayesian data analysis of matrix-valued time series at large scales. The theoretical properties, comparative performance, and computational efficiency of the proposed model is investigated through simulated examples and an application to a panel of country economic indicators.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140098943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-scale correlation screening under dependence for brain functional connectivity network inference","authors":"Hanâ Lbath, Alexander Petersen, Sophie Achard","doi":"10.1007/s11222-024-10411-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10411-x","url":null,"abstract":"<p>Data produced by resting-state functional Magnetic Resonance Imaging are widely used to infer brain functional connectivity networks. Such networks correlate neural signals to connect brain regions, which consist in groups of dependent voxels. Previous work has focused on aggregating data across voxels within predefined regions. However, the presence of within-region correlations has noticeable impacts on inter-regional correlation detection, and thus edge identification. To alleviate them, we propose to leverage techniques from the large-scale correlation screening literature, and derive simple and practical characterizations of the mean number of correlation discoveries that flexibly incorporate intra-regional dependence structures. A connectivity network inference framework is then presented. First, inter-regional correlation distributions are estimated. Then, correlation thresholds that can be tailored to one’s application are constructed for each edge. Finally, the proposed framework is implemented on synthetic and real-world datasets. This novel approach for handling arbitrary intra-regional correlation is shown to limit false positives while improving true positive rates.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140076119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple-output quantile regression neural network","authors":"Ruiting Hao, Xiaorong Yang","doi":"10.1007/s11222-024-10408-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10408-6","url":null,"abstract":"<p>Quantile regression neural network (QRNN) model has received increasing attention in various fields to provide conditional quantiles of responses. However, almost all the available literature about QRNN is devoted to handling the case with one-dimensional responses, which presents a great limitation when we focus on the quantiles of multivariate responses. To deal with this issue, we propose a novel multiple-output quantile regression neural network (MOQRNN) model in this paper to estimate the conditional quantiles of multivariate data. The MOQRNN model is constructed by the following steps. Step 1 acquires the conditional distribution of multivariate responses by a nonparametric method. Step 2 obtains the optimal transport map that pushes the spherical uniform distribution forward to the conditional distribution through the input convex neural network (ICNN). Step 3 provides the conditional quantile contours and regions by the ICNN-based optimal transport map. In both simulation studies and real data application, comparative analyses with the existing method demonstrate that the proposed MOQRNN model is more appealing to yield excellent quantile contours, which are not only smoother but also closer to their theoretical counterparts.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140076024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Total effects with constrained features","authors":"","doi":"10.1007/s11222-024-10398-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10398-5","url":null,"abstract":"<h3>Abstract</h3> <p>Recent studies have emphasized the connection between machine learning feature importance measures and total order sensitivity indices (total effects, henceforth). Feature correlations and the need to avoid unrestricted permutations make the estimation of these indices challenging. Additionally, there is no established theory or approach for non-Cartesian domains. We propose four alternative strategies for computing total effects that account for both dependent and constrained features. Our first approach involves a generalized winding stairs design combined with the Knothe-Rosenblatt transformation. This approach, while applicable to a wide family of input dependencies, becomes impractical when inputs are physically constrained. Our second approach is a U-statistic that combines the Jansen estimator with a weighting factor. The U-statistic framework allows the derivation of a central limit theorem for this estimator. However, this design is computationally intensive. Then, our third approach uses derangements to significantly reduce computational burden. We prove consistency and central limit theorems for these estimators as well. Our fourth approach is based on a nearest-neighbour intuition and it further reduces computational burden. We test these estimators through a series of increasingly complex computational experiments with features constrained on compact and connected domains (circle, simplex), non-compact and non-connected domains (Sierpinski gaskets), we provide comparisons with machine learning approaches and conclude with an application to a realistic simulator.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140035815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of regime-switching diffusions via Fourier transforms","authors":"Thomas Lux","doi":"10.1007/s11222-024-10397-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10397-6","url":null,"abstract":"<p>In this article, an algorithm for maximum-likelihood estimation of regime-switching diffusions is proposed. The proposed approach uses a Fourier transform to numerically solve the system of Fokker–Planck or forward Kolmogorow equations for the temporal evolution of the state densities. Monte Carlo simulations confirm the theoretically expected consistency of this approach for moderate sample sizes and its practical feasibility for certain regime-switching diffusions used in economics and biology with moderate numbers of states and parameters. An application to animal movement data serves as an illustration of the proposed algorithm.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140035718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-dimensional sparse single–index regression via Hilbert–Schmidt independence criterion","authors":"Xin Chen, Chang Deng, Shuaida He, Runxiong Wu, Jia Zhang","doi":"10.1007/s11222-024-10399-4","DOIUrl":"https://doi.org/10.1007/s11222-024-10399-4","url":null,"abstract":"<p>Hilbert-Schmidt Independence Criterion (HSIC) has recently been introduced to the field of single-index models to estimate the directions. Compared with other well-established methods, the HSIC based method requires relatively weak conditions. However, its performance has not yet been studied in the prevalent high-dimensional scenarios, where the number of covariates can be much larger than the sample size. In this article, based on HSIC, we propose to estimate the possibly sparse directions in the high-dimensional single-index models through a parameter reformulation. Our approach estimates the subspace of the direction directly and performs variable selection simultaneously. Due to the non-convexity of the objective function and the complexity of the constraints, a majorize-minimize algorithm together with the linearized alternating direction method of multipliers is developed to solve the optimization problem. Since it does not involve the inverse of the covariance matrix, the algorithm can naturally handle large <i>p</i> small <i>n</i> scenarios. Through extensive simulation studies and a real data analysis, we show that our proposal is efficient and effective in the high-dimensional settings. The <span>(texttt {Matlab})</span> codes for this method are available online.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140005016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvements on scalable stochastic Bayesian inference methods for multivariate Hawkes process","authors":"Alex Ziyu Jiang, Abel Rodriguez","doi":"10.1007/s11222-024-10392-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10392-x","url":null,"abstract":"<p>Multivariate Hawkes Processes (MHPs) are a class of point processes that can account for complex temporal dynamics among event sequences. In this work, we study the accuracy and computational efficiency of three classes of algorithms which, while widely used in the context of Bayesian inference, have rarely been applied in the context of MHPs: stochastic gradient expectation-maximization, stochastic gradient variational inference and stochastic gradient Langevin Monte Carlo. An important contribution of this paper is a novel approximation to the likelihood function that allows us to retain the computational advantages associated with conjugate settings while reducing approximation errors associated with the boundary effects. The comparisons are based on various simulated scenarios as well as an application to the study of risk dynamics in the Standard & Poor’s 500 intraday index prices among its 11 sectors.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140005135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximum likelihood estimation of log-concave densities on tree space","authors":"Yuki Takazawa, Tomonari Sei","doi":"10.1007/s11222-024-10400-0","DOIUrl":"https://doi.org/10.1007/s11222-024-10400-0","url":null,"abstract":"<p>Phylogenetic trees are key data objects in biology, and the method of phylogenetic reconstruction has been highly developed. The space of phylogenetic trees is a nonpositively curved metric space. Recently, statistical methods to analyze samples of trees on this space are being developed utilizing this property. Meanwhile, in Euclidean space, the log-concave maximum likelihood method has emerged as a new nonparametric method for probability density estimation. In this paper, we derive a sufficient condition for the existence and uniqueness of the log-concave maximum likelihood estimator on tree space. We also propose an estimation algorithm for one and two dimensions. Since various factors affect the inferred trees, it is difficult to specify the distribution of a sample of trees. The class of log-concave densities is nonparametric, and yet the estimation can be conducted by the maximum likelihood method without selecting hyperparameters. We compare the estimation performance with a previously developed kernel density estimator numerically. In our examples where the true density is log-concave, we demonstrate that our estimator has a smaller integrated squared error when the sample size is large. We also conduct numerical experiments of clustering using the Expectation-Maximization algorithm and compare the results with k-means++ clustering using Fréchet mean.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Do applied statisticians prefer more randomness or less? Bootstrap or Jackknife?","authors":"Yannis G. Yatracos","doi":"10.1007/s11222-024-10388-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10388-7","url":null,"abstract":"<p>Bootstrap and Jackknife estimates, <span>(T_{n,B}^*)</span> and <span>(T_{n,J},)</span> respectively, of a population parameter <span>(theta )</span> are both used in statistical computations; <i>n</i> is the sample size, <i>B</i> is the number of Bootstrap samples. For any <span>(n_0)</span> and <span>(B_0,)</span> Bootstrap samples do not add new information about <span>(theta )</span> being observations from the original sample and when <span>(B_0<infty ,)</span> <span>(T_{n_0,B_0}^*)</span> includes also resampling variability, an additional source of uncertainty not affecting <span>(T_{n_0, J}.)</span> These are neglected in theoretical papers with results for the utopian <span>(T_{n, infty }^*, )</span> that do not hold for <span>(B<infty .)</span> The consequence is that <span>(T^*_{n_0, B_0})</span> is expected to have larger mean squared error (MSE) than <span>(T_{n_0,J},)</span> namely <span>(T_{n_0,B_0}^*)</span> is inadmissible. The amount of inadmissibility may be very large when populations’ parameters, e.g. the variance, are unbounded and/or with big data. A palliating remedy is increasing <i>B</i>, the larger the better, but the MSEs ordering remains unchanged for <span>(B<infty .)</span> This is confirmed theoretically when <span>(theta )</span> is the mean of a population, and is observed in the estimated total MSE for linear regression coefficients. In the latter, the chance the estimated total MSE with <span>(T_{n,B}^*)</span> improves that with <span>(T_{n,J})</span> decreases to 0 as <i>B</i> increases.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forward stability and model path selection","authors":"Nicholas Kissel, Lucas Mentch","doi":"10.1007/s11222-024-10395-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10395-8","url":null,"abstract":"<p>Most scientific publications follow the familiar recipe of (i) obtain data, (ii) fit a model, and (iii) comment on the scientific relevance of the effects of particular covariates in that model. This approach, however, ignores the fact that there may exist a multitude of similarly-accurate models in which the implied effects of individual covariates may be vastly different. This problem of finding an entire collection of plausible models has also received relatively little attention in the statistics community, with nearly all of the proposed methodologies being narrowly tailored to a particular model class and/or requiring an exhaustive search over all possible models, making them largely infeasible in the current big data era. This work develops the idea of forward stability and proposes a novel, computationally-efficient approach to finding collections of accurate models we refer to as model path selection (MPS). MPS builds up a plausible model collection via a forward selection approach and is entirely agnostic to the model class and loss function employed. The resulting model collection can be displayed in a simple and intuitive graphical fashion, easily allowing practitioners to visualize whether some covariates can be swapped for others with minimal loss.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139927157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}