{"title":"Tensor factorization recommender systems with dependency","authors":"Jiuchen Zhang, Yubai Yuan, Annie Qu","doi":"10.1214/22-ejs1978","DOIUrl":"https://doi.org/10.1214/22-ejs1978","url":null,"abstract":": Dependency structure in recommender systems has been widely adopted in recent years to improve prediction accuracy. In this paper, we propose an innovative tensor-based recommender system, namely, the Ten- sor Factorization with Dependency (TFD). The proposed method utilizes shared factors to characterize the dependency between different modes, in addition to pairwise additive tensor factorization to integrate information among multiple modes. One advantage of the proposed method is that it provides flexibility for different dependency structures by incorporating shared latent factors. In addition, the proposed method unifies both binary and ordinal ratings in recommender systems. We achieve scalable computation for scarce tensors with high missing rates. In theory, we show the asymptotic consistency of estimators with various loss functions for both binary and ordinal data. Our numerical studies demonstrate that the pro- posed method outperforms the existing methods, especially on prediction accuracy.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42174710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Casting vector time series: algorithms for forecasting, imputation, and signal extraction","authors":"T. McElroy","doi":"10.1214/22-ejs2068","DOIUrl":"https://doi.org/10.1214/22-ejs2068","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47473755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kunhui Zhang, Abolfazl Safikhani, Alex Tank, Ali Shojaie
{"title":"Penalized estimation of threshold auto-regressive models with many components and thresholds.","authors":"Kunhui Zhang, Abolfazl Safikhani, Alex Tank, Ali Shojaie","doi":"10.1214/22-EJS1982","DOIUrl":"10.1214/22-EJS1982","url":null,"abstract":"<p><p>Thanks to their simplicity and interpretable structure, autoregressive processes are widely used to model time series data. However, many real time series data sets exhibit non-linear patterns, requiring nonlinear modeling. The threshold Auto-Regressive (TAR) process provides a family of non-linear auto-regressive time series models in which the process dynamics are specific step functions of a thresholding variable. While estimation and inference for low-dimensional TAR models have been investigated, high-dimensional TAR models have received less attention. In this article, we develop a new framework for estimating high-dimensional TAR models, and propose two different sparsity-inducing penalties. The first penalty corresponds to a natural extension of classical TAR model to high-dimensional settings, where the same threshold is enforced for all model parameters. Our second penalty develops a more flexible TAR model, where different thresholds are allowed for different auto-regressive coefficients. We show that both penalized estimation strategies can be utilized in a three-step procedure that consistently learns both the thresholds and the corresponding auto-regressive coefficients. However, our theoretical and empirical investigations show that the direct extension of the TAR model is not appropriate for high-dimensional settings and is better suited for moderate dimensions. In contrast, the more flexible extension of the TAR model leads to consistent estimation and superior empirical performance in high dimensions.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"16 1","pages":"1891-1951"},"PeriodicalIF":1.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10088520/pdf/nihms-1885625.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9851486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On sufficient variable screening using log odds ratio filter","authors":"Baoying Yang, Wenbo Wu, Xiangrong Yin","doi":"10.1214/21-ejs1951","DOIUrl":"https://doi.org/10.1214/21-ejs1951","url":null,"abstract":": For ultrahigh-dimensional data, variable screening is an impor- tant step to reduce the scale of the problem, hence, to improve the estimation accuracy and efficiency. In this paper, we propose a new dependence measure which is called the log odds ratio statistic to be used under the sufficient variable screening framework. The sufficient variable screening approach ensures the sufficiency of the selected input features in model-ing the regression function and is an enhancement of existing marginal screening methods. In addition, we propose an ensemble variable screening approach to combine the proposed fused log odds ratio filter with the fused Kolmogorov filter to achieve supreme performance by taking advantages of both filters. We establish the sure screening properties of the fused log odds ratio filter for both marginal variable screening and sufficient variable screening. Extensive simulations and a real data analysis are provided to demonstrate the usefulness of the proposed log odds ratio filter and the sufficient variable screening procedure.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47203671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monte Carlo Markov chains constrained on graphs for a target with disconnected support","authors":"R. Cerqueti, Emilio De Santis","doi":"10.1214/22-ejs2043","DOIUrl":"https://doi.org/10.1214/22-ejs2043","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44387163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The robust nearest shrunken centroids classifier for high-dimensional heavy-tailed data","authors":"Shaokang Ren, Qing Mai","doi":"10.1214/22-ejs2022","DOIUrl":"https://doi.org/10.1214/22-ejs2022","url":null,"abstract":": The nearest shrunken centroids classifier (NSC) is a popular high-dimensional classifier. However, it is prone to inaccurate classification when the data is heavy-tailed. In this paper, we develop a robust general- ization of NSC (RNSC) which remains effective under such circumstances. By incorporating the Huber loss both in the estimation and the calcula- tion of the score function, we reduce the impacts of heavy tails. We rigorously show the variable selection, estimation, and prediction consistency in high dimensions under weak moment conditions. Empirically, our proposal greatly outperforms NSC and many other successful classifiers when data is heavy-tailed while remaining comparable to NSC in the absence of heavy tails. The favorable performance of RNSC is also demonstrated in a real data example.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45637959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-sample test for equal distributions in separate metric space: New maximum mean discrepancy based approaches","authors":"Jin-Ting Zhang, Łukasz Smaga","doi":"10.1214/22-ejs2033","DOIUrl":"https://doi.org/10.1214/22-ejs2033","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46448002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessia Caponera, J. Fageot, Matthieu Simeoni, V. Panaretos
{"title":"Functional estimation of anisotropic covariance and autocovariance operators on the sphere","authors":"Alessia Caponera, J. Fageot, Matthieu Simeoni, V. Panaretos","doi":"10.1214/22-ejs2064","DOIUrl":"https://doi.org/10.1214/22-ejs2064","url":null,"abstract":"We propose nonparametric estimators for the second-order central moments of possibly anisotropic spherical random fields, within a functional data analysis context. We consider a measurement framework where each random field among an identically distributed collection of spherical random fields is sampled at a few random directions, possibly subject to measurement error. The collection of random fields could be i.i.d. or serially dependent. Though similar setups have already been explored for random functions defined on the unit interval, the nonparametric estimators proposed in the literature often rely on local polynomials, which do not readily extend to the (product) spherical setting. We therefore formulate our estimation procedure as a variational problem involving a generalized Tikhonov regularization term. The latter favours smooth covariance/autocovariance functions, where the smoothness is specified by means of suitable Sobolev-like pseudo-differential operators. Using the machinery of reproducing kernel Hilbert spaces, we establish representer theorems that fully characterize the form of our estimators. We determine their uniform rates of convergence as the number of random fields diverges, both for the dense (increasing number of spatial samples) and sparse (bounded number of spatial samples) regimes. We moreover demonstrate the computational feasibility and practical merits of our estimation procedure in a simulation setting, assuming a fixed number of samples per random field. Our numerical estimation procedure leverages the sparsity and second-order Kronecker structure of our setup to reduce the computational and memory requirements by approximately three orders of magnitude compared to a naive implementation would require. AMS 2000 subject classifications: Primary 62G08; secondary 62M.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2021-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49616151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}