Kunhui Zhang, Abolfazl Safikhani, Alex Tank, Ali Shojaie
{"title":"Penalized estimation of threshold auto-regressive models with many components and thresholds.","authors":"Kunhui Zhang, Abolfazl Safikhani, Alex Tank, Ali Shojaie","doi":"10.1214/22-EJS1982","DOIUrl":"10.1214/22-EJS1982","url":null,"abstract":"<p><p>Thanks to their simplicity and interpretable structure, autoregressive processes are widely used to model time series data. However, many real time series data sets exhibit non-linear patterns, requiring nonlinear modeling. The threshold Auto-Regressive (TAR) process provides a family of non-linear auto-regressive time series models in which the process dynamics are specific step functions of a thresholding variable. While estimation and inference for low-dimensional TAR models have been investigated, high-dimensional TAR models have received less attention. In this article, we develop a new framework for estimating high-dimensional TAR models, and propose two different sparsity-inducing penalties. The first penalty corresponds to a natural extension of classical TAR model to high-dimensional settings, where the same threshold is enforced for all model parameters. Our second penalty develops a more flexible TAR model, where different thresholds are allowed for different auto-regressive coefficients. We show that both penalized estimation strategies can be utilized in a three-step procedure that consistently learns both the thresholds and the corresponding auto-regressive coefficients. However, our theoretical and empirical investigations show that the direct extension of the TAR model is not appropriate for high-dimensional settings and is better suited for moderate dimensions. In contrast, the more flexible extension of the TAR model leads to consistent estimation and superior empirical performance in high dimensions.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"16 1","pages":"1891-1951"},"PeriodicalIF":1.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10088520/pdf/nihms-1885625.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9851486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On sufficient variable screening using log odds ratio filter","authors":"Baoying Yang, Wenbo Wu, Xiangrong Yin","doi":"10.1214/21-ejs1951","DOIUrl":"https://doi.org/10.1214/21-ejs1951","url":null,"abstract":": For ultrahigh-dimensional data, variable screening is an impor- tant step to reduce the scale of the problem, hence, to improve the estimation accuracy and efficiency. In this paper, we propose a new dependence measure which is called the log odds ratio statistic to be used under the sufficient variable screening framework. The sufficient variable screening approach ensures the sufficiency of the selected input features in model-ing the regression function and is an enhancement of existing marginal screening methods. In addition, we propose an ensemble variable screening approach to combine the proposed fused log odds ratio filter with the fused Kolmogorov filter to achieve supreme performance by taking advantages of both filters. We establish the sure screening properties of the fused log odds ratio filter for both marginal variable screening and sufficient variable screening. Extensive simulations and a real data analysis are provided to demonstrate the usefulness of the proposed log odds ratio filter and the sufficient variable screening procedure.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47203671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monte Carlo Markov chains constrained on graphs for a target with disconnected support","authors":"R. Cerqueti, Emilio De Santis","doi":"10.1214/22-ejs2043","DOIUrl":"https://doi.org/10.1214/22-ejs2043","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44387163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The robust nearest shrunken centroids classifier for high-dimensional heavy-tailed data","authors":"Shaokang Ren, Qing Mai","doi":"10.1214/22-ejs2022","DOIUrl":"https://doi.org/10.1214/22-ejs2022","url":null,"abstract":": The nearest shrunken centroids classifier (NSC) is a popular high-dimensional classifier. However, it is prone to inaccurate classification when the data is heavy-tailed. In this paper, we develop a robust general- ization of NSC (RNSC) which remains effective under such circumstances. By incorporating the Huber loss both in the estimation and the calcula- tion of the score function, we reduce the impacts of heavy tails. We rigorously show the variable selection, estimation, and prediction consistency in high dimensions under weak moment conditions. Empirically, our proposal greatly outperforms NSC and many other successful classifiers when data is heavy-tailed while remaining comparable to NSC in the absence of heavy tails. The favorable performance of RNSC is also demonstrated in a real data example.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45637959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-sample test for equal distributions in separate metric space: New maximum mean discrepancy based approaches","authors":"Jin-Ting Zhang, Łukasz Smaga","doi":"10.1214/22-ejs2033","DOIUrl":"https://doi.org/10.1214/22-ejs2033","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46448002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessia Caponera, J. Fageot, Matthieu Simeoni, V. Panaretos
{"title":"Functional estimation of anisotropic covariance and autocovariance operators on the sphere","authors":"Alessia Caponera, J. Fageot, Matthieu Simeoni, V. Panaretos","doi":"10.1214/22-ejs2064","DOIUrl":"https://doi.org/10.1214/22-ejs2064","url":null,"abstract":"We propose nonparametric estimators for the second-order central moments of possibly anisotropic spherical random fields, within a functional data analysis context. We consider a measurement framework where each random field among an identically distributed collection of spherical random fields is sampled at a few random directions, possibly subject to measurement error. The collection of random fields could be i.i.d. or serially dependent. Though similar setups have already been explored for random functions defined on the unit interval, the nonparametric estimators proposed in the literature often rely on local polynomials, which do not readily extend to the (product) spherical setting. We therefore formulate our estimation procedure as a variational problem involving a generalized Tikhonov regularization term. The latter favours smooth covariance/autocovariance functions, where the smoothness is specified by means of suitable Sobolev-like pseudo-differential operators. Using the machinery of reproducing kernel Hilbert spaces, we establish representer theorems that fully characterize the form of our estimators. We determine their uniform rates of convergence as the number of random fields diverges, both for the dense (increasing number of spatial samples) and sparse (bounded number of spatial samples) regimes. We moreover demonstrate the computational feasibility and practical merits of our estimation procedure in a simulation setting, assuming a fixed number of samples per random field. Our numerical estimation procedure leverages the sparsity and second-order Kronecker structure of our setup to reduce the computational and memory requirements by approximately three orders of magnitude compared to a naive implementation would require. AMS 2000 subject classifications: Primary 62G08; secondary 62M.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2021-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49616151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sourav Mukherjee, K. Khare, Saptarshi Chakraborty Department of Statistics, U. Florida, D. Biostatistics, State University of New York at Buffalo
{"title":"Convergence properties of data augmentation algorithms for high-dimensional robit regression","authors":"Sourav Mukherjee, K. Khare, Saptarshi Chakraborty Department of Statistics, U. Florida, D. Biostatistics, State University of New York at Buffalo","doi":"10.1214/22-ejs2098","DOIUrl":"https://doi.org/10.1214/22-ejs2098","url":null,"abstract":"Abstract: The logistic and probit link functions are the most common choices for regression models with a binary response. However, these choices are not robust to the presence of outliers/unexpected observations. The robit link function, which is equal to the inverse CDF of the Student’s t-distribution, provides a robust alternative to the probit and logistic link functions. A multivariate normal prior for the regression coefficients is the standard choice for Bayesian inference in robit regression models. The resulting posterior density is intractable and a Data Augmentation (DA) Markov chain is used to generate approximate samples from the desired posterior distribution. Establishing geometric ergodicity for this DA Markov chain is important as it provides theoretical guarantees for asymptotic validity of MCMC standard errors for desired posterior expectations/quantiles. Previous work [1] established geometric ergodicity of this robit DA Markov chain assuming (i) the sample size n dominates the number of predictors p, and (ii) an additional constraint which requires the sample size to be bounded above by a fixed constant which depends on the design matrix X. In particular, modern highdimensional settings where n < p are not considered. In this work, we show that the robit DA Markov chain is trace-class (i.e., the eigenvalues of the corresponding Markov operator are summable) for arbitrary choices of the sample size n, the number of predictors p, the design matrix X, and the prior mean and variance parameters. The trace-class property implies geometric ergodicity. Moreover, this property allows us to conclude that the sandwich robit chain (obtained by inserting an inexpensive extra step in between the two steps of the DA chain) is strictly better than the robit DA chain in an appropriate sense, and enables the use of recent methods to estimate the spectral gap of trace class DA Markov chains.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2021-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48632525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible inference of optimal individualized treatment strategy in covariate adjusted randomization with multiple covariates","authors":"Trinetri Ghosh, Yanyuan Ma, Rui Song, Pingshou Zhong","doi":"10.1214/23-ejs2127","DOIUrl":"https://doi.org/10.1214/23-ejs2127","url":null,"abstract":"To maximize clinical benefit, clinicians routinely tailor treatment to the individual characteristics of each patient, where individualized treatment rules are needed and are of significant research interest to statisticians. In the covariate-adjusted randomization clinical trial with many covariates, we model the treatment effect with an unspecified function of a single index of the covariates and leave the baseline response completely arbitrary. We devise a class of estimators to consistently estimate the treatment effect function and its associated index while bypassing the estimation of the baseline response, which is subject to the curse of dimensionality. We further develop inference tools to identify predictive covariates and isolate effective treatment region. The usefulness of the methods is demonstrated in both simulations and a clinical data example.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48902829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bounds in L1 Wasserstein distance on the normal approximation of general M-estimators","authors":"F. Bachoc, M. Fathi","doi":"10.1214/23-ejs2132","DOIUrl":"https://doi.org/10.1214/23-ejs2132","url":null,"abstract":"We derive quantitative bounds on the rate of convergence in $L^1$ Wasserstein distance of general M-estimators, with an almost sharp (up to a logarithmic term) behavior in the number of observations. We focus on situations where the estimator does not have an explicit expression as a function of the data. The general method may be applied even in situations where the observations are not independent. Our main application is a rate of convergence for cross validation estimation of covariance parameters of Gaussian processes.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42149805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}