{"title":"Generalized spherical principal component analysis","authors":"Sarah Leyder, Jakob Raymaekers, Tim Verdonck","doi":"10.1007/s11222-024-10413-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10413-9","url":null,"abstract":"<p>Outliers contaminating data sets are a challenge to statistical estimators. Even a small fraction of outlying observations can heavily influence most classical statistical methods. In this paper we propose generalized spherical principal component analysis, a new robust version of principal component analysis that is based on the generalized spatial sign covariance matrix. Theoretical properties of the proposed method including influence functions, breakdown values and asymptotic efficiencies are derived. These theoretical results are complemented with an extensive simulation study and two real-data examples. We illustrate that generalized spherical principal component analysis can combine great robustness with solid efficiency properties, in addition to a low computational cost.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An expectile computation cookbook","authors":"","doi":"10.1007/s11222-024-10403-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10403-x","url":null,"abstract":"<h3>Abstract</h3> <p>A substantial body of work in the last 15 years has shown that expectiles constitute an excellent candidate for becoming a standard tool in probabilistic and statistical modeling. Surprisingly, the question of how expectiles may be efficiently calculated has been left largely untouched. We fill this gap by, first, providing a general outlook on the computation of expectiles that relies on the knowledge of analytic expressions of the underlying distribution function and mean residual life function. We distinguish between discrete distributions, for which an exact calculation is always feasible, and continuous distributions, where a Newton–Raphson approximation algorithm can be implemented and a list of exceptional distributions whose expectiles are in analytic form can be given. When the distribution function and/or the mean residual life is difficult to compute, Monte-Carlo algorithms are introduced, based on an exact calculation of sample expectiles and on the use of control variates to improve computational efficiency. We discuss the relevance of our findings to statistical practice and provide numerical evidence of the performance of the considered methods.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"18 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140205561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable selection using axis-aligned random projections for partial least-squares regression","authors":"","doi":"10.1007/s11222-024-10417-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10417-5","url":null,"abstract":"<h3>Abstract</h3> <p>In high-dimensional data modeling, variable selection plays a crucial role in improving predictive accuracy and enhancing model interpretability through sparse representation. Unfortunately, certain variable selection methods encounter challenges such as insufficient model sparsity, high computational overhead, and difficulties in handling large-scale data. Recently, axis-aligned random projection techniques have been applied to address these issues by selecting variables. However, these techniques have seen limited application in handling complex data within the regression framework. In this study, we propose a novel method, sparse partial least squares via axis-aligned random projection, designed for the analysis of high-dimensional data. Initially, axis-aligned random projection is utilized to obtain a sparse loading vector, significantly reducing computational complexity. Subsequently, partial least squares regression is conducted within the subspace of the top-ranked significant variables. The submatrices are iteratively updated until an optimal sparse partial least squares model is achieved. Comparative analysis with some state-of-the-art high-dimensional regression methods demonstrates that the proposed method exhibits superior predictive performance. To illustrate its effectiveness, we apply the method to four cases, including one simulated dataset and three real-world datasets. The results show the proposed method’s ability to identify important variables in all four cases.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"39 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jungmin Shin, Seunghyun Gwak, Seung Jun Shin, Sungwan Bang
{"title":"Simultaneous estimation and variable selection for a non-crossing multiple quantile regression using deep neural networks","authors":"Jungmin Shin, Seunghyun Gwak, Seung Jun Shin, Sungwan Bang","doi":"10.1007/s11222-024-10418-4","DOIUrl":"https://doi.org/10.1007/s11222-024-10418-4","url":null,"abstract":"<p>In this paper, we present the DNN-NMQR estimator, an approach that utilizes a deep neural network structure to solve multiple quantile regression problems. When estimating multiple quantiles, our approach leverages the structural characteristics of DNN to enhance estimation results by encouraging shared learning across different quantiles through DNN-NMQR. Also, this method effectively addresses quantile crossing issues through the penalization method. To refine our methodology, we introduce a convolution-type quadratic smoothing function, ensuring that the objective function remains differentiable throughout. Furthermore, we provide a brief discussion on the convergence analysis of DNN-NMQR, drawing on the concept of the neural tangent kernel. For a high-dimensional case, we propose the (A)GDNN-NMQR estimator, which applies group-wise <span>(L_1)</span>-type regularization methods and enjoys the advantages of quantile estimation and variable selection simultaneously. We extensively validate all of our proposed methods through numerical experiments and real data analysis.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"22 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Resampling-based confidence intervals and bands for the average treatment effect in observational studies with competing risks","authors":"Jasmin Rühl, Sarah Friedrich","doi":"10.1007/s11222-024-10420-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10420-w","url":null,"abstract":"<p>The g-formula can be used to estimate the treatment effect while accounting for confounding bias in observational studies. With regard to time-to-event endpoints, possibly subject to competing risks, the construction of valid pointwise confidence intervals and time-simultaneous confidence bands for the causal risk difference is complicated, however. A convenient solution is to approximate the asymptotic distribution of the corresponding stochastic process by means of resampling approaches. In this paper, we consider three different resampling methods, namely the classical nonparametric bootstrap, the influence function equipped with a resampling approach as well as a martingale-based bootstrap version, the so-called wild bootstrap. For the latter, three sub-versions based on differing distributions of the underlying random multipliers are examined. We set up a simulation study to compare the accuracy of the different techniques, which reveals that the wild bootstrap should in general be preferred if the sample size is moderate and sufficient data on the event of interest have been accrued. For illustration, the resampling methods are further applied to data on the long-term survival in patients with early-stage Hodgkin’s disease.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"4 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kes Ward, Gaetano Romano, Idris Eckley, Paul Fearnhead
{"title":"A constant-per-iteration likelihood ratio test for online changepoint detection for exponential family models","authors":"Kes Ward, Gaetano Romano, Idris Eckley, Paul Fearnhead","doi":"10.1007/s11222-024-10416-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10416-6","url":null,"abstract":"<p>Online changepoint detection algorithms that are based on (generalised) likelihood-ratio tests have been shown to have excellent statistical properties. However, a simple online implementation is computationally infeasible as, at time <i>T</i>, it involves considering <i>O</i>(<i>T</i>) possible locations for the change. Recently, the FOCuS algorithm has been introduced for detecting changes in mean in Gaussian data that decreases the per-iteration cost to <span>(O(log T))</span>. This is possible by using pruning ideas, which reduce the set of changepoint locations that need to be considered at time <i>T</i> to approximately <span>(log T)</span>. We show that if one wishes to perform the likelihood ratio test for a different one-parameter exponential family model, then exactly the same pruning rule can be used, and again one need only consider approximately <span>(log T)</span> locations at iteration <i>T</i>. Furthermore, we show how we can adaptively perform the maximisation step of the algorithm so that we need only maximise the test statistic over a small subset of these possible locations. Empirical results show that the resulting online algorithm, which can detect changes under a wide range of models, has a constant-per-iteration cost on average.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"41 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140168815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving model choice in classification: an approach based on clustering of covariance matrices","authors":"David Rodríguez-Vítores, Carlos Matrán","doi":"10.1007/s11222-024-10410-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10410-y","url":null,"abstract":"<p>This work introduces a refinement of the Parsimonious Model for fitting a Gaussian Mixture. The improvement is based on the consideration of clusters of the involved covariance matrices according to a criterion, such as sharing Principal Directions. This and other similarity criteria that arise from the spectral decomposition of a matrix are the bases of the Parsimonious Model. We show that such groupings of covariance matrices can be achieved through simple modifications of the CEM (Classification Expectation Maximization) algorithm. Our approach leads to propose Gaussian Mixture Models for model-based clustering and discriminant analysis, in which covariance matrices are clustered according to a parsimonious criterion, creating intermediate steps between the fourteen widely known parsimonious models. The added versatility not only allows us to obtain models with fewer parameters for fitting the data, but also provides greater interpretability. We show its usefulness for model-based clustering and discriminant analysis, providing algorithms to find approximate solutions verifying suitable size, shape and orientation constraints, and applying them to both simulation and real data examples.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"99 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Faïcel Chamroukhi, Nhat Thien Pham, Van Hà Hoang, Geoffrey J. McLachlan
{"title":"Functional mixtures-of-experts","authors":"Faïcel Chamroukhi, Nhat Thien Pham, Van Hà Hoang, Geoffrey J. McLachlan","doi":"10.1007/s11222-023-10379-0","DOIUrl":"https://doi.org/10.1007/s11222-023-10379-0","url":null,"abstract":"<p>We consider the statistical analysis of heterogeneous data for prediction, in situations where the observations include functions, typically time series. We extend the modeling with mixtures-of-experts (ME), as a framework of choice in modeling heterogeneity in data for prediction with vectorial observations, to this functional data analysis context. We first present a new family of ME models, named functional ME (FME), in which the predictors are potentially noisy observations, from entire functions. Furthermore, the data generating process of the predictor and the real response, is governed by a hidden discrete variable representing an unknown partition. Second, by imposing sparsity on derivatives of the underlying functional parameters via Lasso-like regularizations, we provide sparse and interpretable functional representations of the FME models called iFME. We develop dedicated expectation–maximization algorithms for Lasso-like regularized maximum-likelihood parameter estimation strategies to fit the models. The proposed models and algorithms are studied in simulated scenarios and in applications to two real data sets, and the obtained results demonstrate their performance in accurately capturing complex nonlinear relationships and in clustering the heterogeneous regression data.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"114 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ian Meneghel Danilevicz, Valdério Anselmo Reisen, Pascal Bondon
{"title":"Expectile and M-quantile regression for panel data","authors":"Ian Meneghel Danilevicz, Valdério Anselmo Reisen, Pascal Bondon","doi":"10.1007/s11222-024-10396-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10396-7","url":null,"abstract":"<p>Linear fixed effect models are a general way to fit panel or longitudinal data with a distinct intercept for each unit. Based on expectile and M-quantile approaches, we propose alternative regression estimation methods to estimate the parameters of linear fixed effect models. The estimation functions are penalized by the least absolute shrinkage and selection operator to reduce the dimensionality of the data. Some asymptotic properties of the estimators are established, and finite sample size investigations are conducted to verify the empirical performances of the estimation methods. The computational implementations of the procedures are discussed, and real economic panel data from the Organisation for Economic Cooperation and Development are analyzed to show the usefulness of the methods in a practical problem.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"17 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matrix regression heterogeneity analysis","authors":"Fengchuan Zhang, Sanguo Zhang, Shi-Ming Li, Mingyang Ren","doi":"10.1007/s11222-024-10401-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10401-z","url":null,"abstract":"<p>The development of modern science and technology has facilitated the collection of a large amount of matrix data in fields such as biomedicine. Matrix data modeling has been extensively studied, which advances from the naive approach of flattening the matrix into a vector. However, existing matrix modeling methods mainly focus on homogeneous data, failing to handle the data heterogeneity frequently encountered in the biomedical field, where samples from the same study belong to several underlying subgroups, and different subgroups follow different models. In this paper, we focus on regression-based heterogeneity analysis. We propose a matrix data heterogeneity analysis framework, by combining matrix bilinear sparse decomposition and penalized fusion techniques, which enables data-driven subgroup detection, including determining the number of subgroups and subgrouping membership. A rigorous theoretical analysis is conducted, including asymptotic consistency in terms of subgroup detection, the number of subgroups, and regression coefficients. Numerous numerical studies based on simulated and real data have been constructed, showcasing the superior performance of the proposed method in analyzing matrix heterogeneous data.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"57 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}