{"title":"Testing subspace restrictions in the presence of high dimensional nuisance parameters","authors":"Alessio Sancetta","doi":"10.1214/22-ejs2058","DOIUrl":"https://doi.org/10.1214/22-ejs2058","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42335792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concentration inequalities for non-causal random fields","authors":"Rémy Garnier, Raphael Langhendries","doi":"10.1214/22-ejs1992","DOIUrl":"https://doi.org/10.1214/22-ejs1992","url":null,"abstract":"Concentration inequalities are widely used for analyzing machines learning algorithms. However, current concentration inequalities cannot be applied to some of the most popular deep neural networks, notably in natural language processing. This is mostly due to the non-causal nature of such involved data, in the sense that each data point depends on other neighbor data points. In this paper, a framework for modeling non-causal random fields is provided and a Hoeffding-type concentration inequality is obtained for this framework. The proof of this result relies on a local approximation of the non-causal random field by a function of a finite number of i.i.d. random variables.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49352696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CDPA: Common and Distinctive Pattern Analysis between High-dimensional Datasets.","authors":"Hai Shu, Zhe Qu","doi":"10.1214/22-EJS2008","DOIUrl":"https://doi.org/10.1214/22-EJS2008","url":null,"abstract":"<p><p>A representative model in integrative analysis of two high-dimensional correlated datasets is to decompose each data matrix into a low-rank common matrix generated by latent factors shared across datasets, a low-rank distinctive matrix corresponding to each dataset, and an additive noise matrix. Existing decomposition methods claim that their common matrices capture the common pattern of the two datasets. However, their so-called common pattern only denotes the common latent factors but ignores the common pattern between the two coefficient matrices of these common latent factors. We propose a new unsupervised learning method, called the common and distinctive pattern analysis (CDPA), which appropriately defines the two types of data patterns by further incorporating the common and distinctive patterns of the coefficient matrices. A consistent estimation approach is developed for high-dimensional settings, and shows reasonably good finite-sample performance in simulations. Our simulation studies and real data analysis corroborate that the proposed CDPA can provide better characterization of common and distinctive patterns and thereby benefit data mining.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":"2475-2517"},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9410619/pdf/nihms-1830529.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33443550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of the variance matrix in bivariate classical measurement error models","authors":"Elif Kekeç, I. Van Keilegom","doi":"10.1214/22-ejs1996","DOIUrl":"https://doi.org/10.1214/22-ejs1996","url":null,"abstract":": The presence of measurement errors is a ubiquitously faced problem and plenty of work has been done to overcome this when a single covariate is mismeasured under a variety of conditions. However, in practice, it is possible that more than one covariate is measured with error. When measurements are taken by the same device, the errors of these measurements are likely correlated. In this paper, we present a novel approach to estimate the covariance matrix of classical additive errors in the absence of validation data or auxiliary variables when two covariates are subject to measurement error. Our method assumes these errors to be following a bivariate normal distribution. We show that the variance matrix is identifiable under certain conditions on the support of the error-free variables and propose an estimation method based on an expansion of Bernstein polynomials. To investigate the per- formance of the proposed estimation method, the asymptotic properties of the estimator are examined and a diverse set of simulation studies is con- ducted. The estimated matrix is then used by the simulation-extrapolation (SIMEX) algorithm to reduce the bias caused by measurement error in lo- gistic regression models. Finally, the method is demonstrated using data from the Framingham Heart Study.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47033673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Poisson mean vector estimation with nonparametric maximum likelihood estimation and application to protein domain data","authors":"Hoyoung Park, Junyong Park","doi":"10.1214/22-ejs2029","DOIUrl":"https://doi.org/10.1214/22-ejs2029","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46150414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating the number of communities by spectral methods","authors":"Can M. Le, E. Levina","doi":"10.1214/21-ejs1971","DOIUrl":"https://doi.org/10.1214/21-ejs1971","url":null,"abstract":"Community detection is a fundamental problem in network analysis with many methods available to estimate communities. Most of these methods assume that the number of communities is known, which is often not the case in practice. We study a simple and very fast method for estimating the number of communities based on the spectral properties of certain graph operators, such as the non-backtracking matrix and the Bethe Hessian matrix. We show that the method performs well under several models and a wide range of parameters, and is guaranteed to be consistent under several asymptotic regimes. We compare this method to several existing methods for estimating the number of communities and show that it is both more accurate and more computationally efficient. MSC2020 subject classifications: Primary 62H12; secondary 62H30.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46011126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nested covariance functions on graphs with Euclidean edges cross time","authors":"E. Porcu, X. Emery, A. Peron","doi":"10.1214/22-ejs2039","DOIUrl":"https://doi.org/10.1214/22-ejs2039","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43993257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Penalized nonparametric likelihood-based inference for current status data model","authors":"Meiling Hao, Yuanyuan Lin, Kin-Yat Liu, Xingqiu Zhao","doi":"10.1214/21-ejs1970","DOIUrl":"https://doi.org/10.1214/21-ejs1970","url":null,"abstract":": Deriving the limiting distribution of a nonparametric estimate is rather challenging but of fundamental importance to statistical inference. For the current status data, we study a penalized nonparametric likelihood- based estimator for an unknown cumulative hazard function, and establish the pointwise asymptotic normality of the resulting nonparametric esti- mate. We also propose the penalized likelihood ratio tests for local and global hypotheses, derive their limiting distributions, and study the opti- mality of the global test. Simulation studies show that the proposed method works well compared to the classical likelihood ratio test.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46566031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust consistent estimators for ROC curves with covariates","authors":"Ana M. Bianco, G. Boente, W. González-Manteiga","doi":"10.1214/22-ejs2042","DOIUrl":"https://doi.org/10.1214/22-ejs2042","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42756461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensor factorization recommender systems with dependency","authors":"Jiuchen Zhang, Yubai Yuan, Annie Qu","doi":"10.1214/22-ejs1978","DOIUrl":"https://doi.org/10.1214/22-ejs1978","url":null,"abstract":": Dependency structure in recommender systems has been widely adopted in recent years to improve prediction accuracy. In this paper, we propose an innovative tensor-based recommender system, namely, the Ten- sor Factorization with Dependency (TFD). The proposed method utilizes shared factors to characterize the dependency between different modes, in addition to pairwise additive tensor factorization to integrate information among multiple modes. One advantage of the proposed method is that it provides flexibility for different dependency structures by incorporating shared latent factors. In addition, the proposed method unifies both binary and ordinal ratings in recommender systems. We achieve scalable computation for scarce tensors with high missing rates. In theory, we show the asymptotic consistency of estimators with various loss functions for both binary and ordinal data. Our numerical studies demonstrate that the pro- posed method outperforms the existing methods, especially on prediction accuracy.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42174710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}