{"title":"A random projection method for large-scale community detection","authors":"Haobo Qi, Hansheng Wang, Xuening Zhu","doi":"10.4310/22-sii752","DOIUrl":"https://doi.org/10.4310/22-sii752","url":null,"abstract":"In this work, we consider a random projection method for a large-scale community detection task. We introduce a random Gaussian matrix that generates several projections on the column space of the network adjacency matrix. The $k$-means algorithm is then applied with the low-dimensional projected matrix. The computational complexity is much lower than that of the classic spectral clustering methods. Furthermore, the algorithm is easy to implement and accessible for privacy preservation. We can theoretically establish a strong consistency result of the algorithm under the stochastic block model. Extensive numerical studies are conducted to verify the theoretical findings and illustrate the usefulness of the proposed method.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"1 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correlated Wishart matrices classification via an expectation-maximization composite likelihood-based algorithm","authors":"Zhou Lan","doi":"10.4310/22-sii770","DOIUrl":"https://doi.org/10.4310/22-sii770","url":null,"abstract":"Positive-definite matrix-variate data is becoming popular in computer vision. The computer vision data descriptors in the form of Region Covariance Descriptors (RCD) are positive definite matrices, which extract the key features of the images. The RCDs are extensively used in image set classification. Some classification methods treating RCDs as Wishart distributed random matrices are being proposed. However, the majority of the current methods preclude the potential correlation among the RCDs caused by the so-called auxiliary information (e.g., subjects’ ages and nose widths, etc). Modeling correlated Wishart matrices is difficult since the joint density function of correlated Wishart matrices is difficult to be obtained. In this paper, we propose an Expectation-Maximization composite likelihoodbased algorithm of Wishart matrices to tackle this issue. Given the numerical studies based on the synthetic data and the real data (Chicago face data-set), our proposed algorithm performs better than the alternative methods which do not consider the correlation caused by the so-called auxiliary information.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"27 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust and covariance-assisted tensor response regression","authors":"Ning Wang, Xin Zhang","doi":"10.4310/sii.2024.v17.n2.a10","DOIUrl":"https://doi.org/10.4310/sii.2024.v17.n2.a10","url":null,"abstract":"Tensor data analysis is gaining increasing popularity in modern multivariate statistics. When analyzing real-world tensor data, many existing tensor estimation approaches are sensitive to heavy-tailed data and outliers, in addition to the apparent high-dimensionality. In this article, we develop a robust and covariance-assisted tensor response regression model based on a recently proposed tensor t‑distribution to address these issues in tensor data. This model assumes that the tensor regression coefficient has a low-rank structure that can be learned more effectively using the additional covariance information. This enables a fast and robust decomposition-based estimation method. Theoretical analysis and numerical experiments demonstrate the superior performance of our approach. By addressing the heavy-tail, high-order, and high-dimensional issues, our work contributes to robust and effective estimation methods for tensor response regression, with broad applicability in various domains.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"24 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian tensor-on-tensor regression with efficient computation","authors":"Kunbo Wang, Yanxun Xu","doi":"10.4310/23-sii786","DOIUrl":"https://doi.org/10.4310/23-sii786","url":null,"abstract":"We propose a Bayesian tensor-on-tensor regression approach to predict a multidimensional array (tensor) of arbitrary dimensions from another tensor of arbitrary dimensions, building upon the Tucker decomposition of the regression coefficient tensor. Traditional tensor regression methods making use of the Tucker decomposition either assume the dimension of the core tensor to be known or estimate it via cross-validation or some model selection criteria. However, no existing method can simultaneously estimate the model dimension (the dimension of the core tensor) and other model parameters. To fill this gap, we develop an efficient Markov Chain Monte Carlo (MCMC) algorithm to estimate both the model dimension and parameters for posterior inference. Besides the MCMC sampler, we also develop an ultra-fast optimization-based computing algorithm wherein the maximum <i>a posteriori</i> estimators for parameters are computed, and the model dimension is optimized via a simulated annealing algorithm. The proposed Bayesian framework provides a natural way for uncertainty quantification. Through extensive simulation studies, we evaluate the proposed Bayesian tensor-on-tensor regression model and show its superior performance compared to alternative methods. We also demonstrate its practical effectiveness by applying it to two real-world datasets, including facial imaging data and 3D motion data.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"23 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139658968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Density-convoluted tensor support vector machines","authors":"Boxiang Wang, Le Zhou, Jian Yang, Qing Mai","doi":"10.4310/23-sii796","DOIUrl":"https://doi.org/10.4310/23-sii796","url":null,"abstract":"With the emergence of tensor data (also known as multi-dimensional arrays) in many modern applications such as image processing and digital marketing, tensor classification is gaining increasing attention. Although there is a rich toolbox of classification methods for vector-based data, these traditional methods may not be adequate for tensor data classification. In this paper, we propose a new classifier called density-convoluted tensor support vector machine (DCT‑SVM). This method is motivated by applying a kernel density convolution method on the SVM loss to induce a new family of classification loss functions. To establish the theoretical foundation of DCT‑SVM, the probabilistic order of magnitude for its excess risk is systematically studied. For efficiently computing DCT‑SVM, we develop a fast monotone accelerated proximal gradient descent algorithm and show the convergence of the algorithm. With simulation studies, we demonstrate the superior performance of DCT‑SVM over many popular classification methods. We further demonstrate the real potential of DCT‑SVM using a modern data application for online advertising.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"36 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianchen Gao, Rui Pan, Junfei Zhang, Hansheng Wang
{"title":"Community detection in temporal citation network via a tensor-based approach","authors":"Tianchen Gao, Rui Pan, Junfei Zhang, Hansheng Wang","doi":"10.4310/22-sii751","DOIUrl":"https://doi.org/10.4310/22-sii751","url":null,"abstract":"In the era of big data, network analysis has attracted widespread attention. Detecting and tracking community evolution in temporal networks can uncover important and interesting behaviors. In this paper, we analyze a temporal citation network constructed by publications collected from 44 statistical journals between 2001 and 2018. We propose an approach named Tensor-based Directed Spectral Clustering On Ratios of Eigenvectors (TD-SCORE) which can correct for degree heterogeneity to detect the community structure of the temporal citation network. We first explore the characteristics of the temporal network via in-degree distribution and visualization of different snapshots, and we find that both the community structure and the key nodes change over time. Then, we apply the TD-SCORE method to the core network of our temporal citation network. Seven communities are identified, including variable selection, Bayesian analysis, functional data analysis, and many others. Finally, we track the evolution of the above communities and reach some conclusions.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"26 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-way overlapping clustering by Bayesian tensor decomposition","authors":"Zhuofan Wang, Fangting Zhou, Kejun He, Yang Ni","doi":"10.4310/23-sii790","DOIUrl":"https://doi.org/10.4310/23-sii790","url":null,"abstract":"The development of modern sequencing technologies provides great opportunities to measure gene expression of multiple tissues from different individuals. The three-way variation across genes, tissues, and individuals makes statistical inference a challenging task. In this paper, we propose a Bayesian multi-way clustering approach to cluster genes, tissues, and individuals simultaneously. The proposed model adaptively trichotomizes the observed data into three latent categories and uses a Bayesian hierarchical construction to further decompose the latent variables into lower-dimensional features, which can be interpreted as overlapping clusters. With a Bayesian nonparametric prior, i.e., the Indian buffet process, our method determines the cluster number automatically. The utility of our approach is demonstrated through simulation studies and an application to the Genotype-Tissue Expression (GTEx) RNA-seq data. The clustering result reveals some interesting findings about depression-related genes in human brain, which are also consistent with biological domain knowledge. The detailed algorithm and some numerical results are available in the online Supplementary Material, available at $href{https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}{ https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"12 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning conditional dependence graph for concepts via matrix normal graphical model","authors":"Jizheng Lai, Jianxin Yin","doi":"10.4310/23-sii784","DOIUrl":"https://doi.org/10.4310/23-sii784","url":null,"abstract":"Conditional dependence relationships for random vectors are extensively studied and broadly applied. But it is not very clear how to construct the dependence graph for unstructured data like concept words or phrases in text corpus, where the variables(concepts) are not jointly observed with i.i.d. assumption. Using the global embedding methods like GloVe, we get the ‘structured’ representation vectors for concepts. Then we assume that all the concept vectors jointly follow a matrix normal distribution with sparse precision matrices. With the observation of the word-word co-occurrence matrix and the GloVe construction procedure, we can test this assumption empirically. The asymptotic distribution for the test statistics is derived. Another advantage of this matrix-normal distributional assumption is that the linearly additive property in word analogy tasks is natural and straightforward. Different from knowledge graph methods, the conditional dependence graph describes the conditional dependence structure between concepts given all other concepts, which means that the concepts(nodes) linked by edges cannot be separated by other concepts. It represents an essential semantic relationship. There is no need to enumerate all related pairs as head and tail elements of a triplet in knowledge graph regime. And the relation type in this graph is solely the conditional dependence between concepts. A penalized matrix normal graphical model (MNGM) is then employed to learn the conditional dependence graph for both the concepts and the embedding ‘dimensions’. Since the concept words are nodes in our graph with huge dimensions, we employ the MDMC optimization method to speed up the glasso algorithm. Also, the algorithm is adaptive to incremental accumulation of new concepts in text corpus. On the other hand, we propose a sentence granularity bootstrap to get ‘independent’ repeats of samples to enhance the penalized MNGM algorithm.We name the proposed method as Matrix-GloVe. In simulation studies, we check that the graph learned by Matrix-GloVe is more suitable for Graph Convolutional Networks(GCN) than a correlation graph, i.e. a graph determined from the k-NN method. We employ the proposed method in two scenarios from real data. The first scenario is concept graph learning for concepts in textbook corpus. Under this scenario, two tasks are studied. One is comparing the vectors output by GloVe and other word2vec methods, i.e. CBOW and Skip-Gram, then the vectors are used by penalized MNGM. Another task is link prediction among the concepts. On both tasks, Matrix-GloVe achieves better. In the second scenario, Matrix-GloVe is applied to a downstream method i.e. GCN. For node classification tasks on the BBC and BBCSport datasets, both GCN with Matrix- GloVe and GCN with Matrix-GloVe plus Deepwalk outperform GCN with k-NN.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"281 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model-based statistical depth for matrix data","authors":"Yue Mu, Guanyu Hu, Wei Wu","doi":"10.4310/23-sii829","DOIUrl":"https://doi.org/10.4310/23-sii829","url":null,"abstract":"The field of matrix data learning has witnessed significant advancements in recent years, encompassing diverse datasets such as medical images, social networks, and personalized recommendation systems. These advancements have found widespread application in various domains, including medicine, biology, public health, engineering, finance, economics, sports analytics, and environmental sciences. While extensive research has been conducted on estimation, inference, prediction, and computation for matrix data, the ranking problem has not received adequate attention. Statistical depth, a measure providing a centeroutward rank for different data types, has been introduced in the past few decades. However, its exploration has been limited due to the complexity of the second and higher orderstatistics. In this paper, we propose an approach to rank matrix data by employing a model-based depth framework. Our methodology involves estimating the eigen-decomposition of a 4th-order covariance tensor. To enable this process using conventional matrix operations, we specify the tensor product operator between matrices and 4th-order tensors. Furthermore, we introduce a Kronecker product form on the covariance to enhance the robustness and efficiency of the estimation process, effectively reducing the number of parameters in the model. Based on this new framework, we develop an efficient algorithm to estimate the model-based statistical depth. To validate the effectiveness of our proposed method, we conduct simulations and apply it to two real-world applications: field goal attempts of NBA players and global temperature anomalies.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"281 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139658971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian methods in tensor analysis","authors":"Shi Yiyao, Shen Weining","doi":"10.4310/23-sii802","DOIUrl":"https://doi.org/10.4310/23-sii802","url":null,"abstract":"Tensors, also known as multidimensional arrays, are useful data structures in machine learning and statistics. In recent years, Bayesian methods have emerged as a popular direction for analyzing tensor-valued data since they provide a convenient way to introduce sparsity into the model and conduct uncertainty quantification. In this article, we provide an overview of frequentist and Bayesian methods for solving tensor completion and regression problems, with a focus on Bayesian methods. We review common Bayesian tensor approaches including model formulation, prior assignment, posterior computation, and theoretical properties.We also discuss potential future directions in this field.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"49 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}