{"title":"Hierarchical structure-guided high-dimensional multi-view clustering","authors":"Jiajia Jiang , Kuangnan Fang , Shuangge Ma , Qingzhao Zhang","doi":"10.1016/j.jmva.2025.105488","DOIUrl":"10.1016/j.jmva.2025.105488","url":null,"abstract":"<div><div>Multi-view data clustering is pivotal for comprehending the heterogeneous structure of data by integrating information from diverse aspects. Nevertheless, practical challenges arise due to the differences in the granularity from different views, resulting in a hierarchical clustering structure within these distinct data types. In this work, we consider such structure information and propose a novel high-dimensional multi-view clustering approach with a hierarchical structure across views. The proposed non-convex problem is effectively tackled using the Alternating Direction Method of Multipliers algorithm, and we establish the statistical properties of the estimator. Simulation results demonstrate the effectiveness and superiority of our proposed method. In the analysis of the histopathological imaging data and gene expression data related to lung adenocarcinoma, our method unveils a hierarchical clustering structure that significantly diverges from alternative approaches.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105488"},"PeriodicalIF":1.4,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145155083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel martingale difference correlation via data splitting with applications in feature screening","authors":"Zhengyu Zhu , Jicai Liu , Riquan Zhang","doi":"10.1016/j.jmva.2025.105508","DOIUrl":"10.1016/j.jmva.2025.105508","url":null,"abstract":"<div><div>In this paper, we introduce a novel sample martingale difference correlation via data splitting to measure the departure of conditional mean independence between a response variable <span><math><mi>Y</mi></math></span> and a vector predictor <span><math><mi>X</mi></math></span>. The proposed correlation converges to zero and has an asymptotically symmetric sampling distribution around zero when <span><math><mi>Y</mi></math></span> and <span><math><mi>X</mi></math></span> are conditionally mean independent. In contrast, it converges to a positive value when <span><math><mi>Y</mi></math></span> and <span><math><mi>X</mi></math></span> are conditionally mean dependent. Leveraging these properties, we develop a new model-free feature screening method with false discovery rate (FDR) control for ultrahigh-dimensional data. We demonstrate that this screening method achieves FDR control and the sure screening property simultaneously. We also extend our approach to conditional quantile screening with FDR control. To further enhance the stability of the screening results, we implement multiple splitting techniques. We evaluate the finite sample performance of our proposed methods through simulations and real data analyses, and compare them with existing methods.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105508"},"PeriodicalIF":1.4,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On nonparametric functional data regression with incomplete observations","authors":"Majid Mojirsheibani","doi":"10.1016/j.jmva.2025.105497","DOIUrl":"10.1016/j.jmva.2025.105497","url":null,"abstract":"<div><div>In this work we consider the problem of nonparametric estimation of a regression function <span><math><mrow><mi>m</mi><mrow><mo>(</mo><mi>χ</mi><mo>)</mo></mrow><mo>=</mo><mi>E</mi><mrow><mo>(</mo><mi>Y</mi><mo>|</mo><mspace></mspace><mi>χ</mi><mo>=</mo><mi>χ</mi><mo>)</mo></mrow></mrow></math></span> with the functional covariate <span><math><mrow><mi>χ</mi></mrow></math></span> when the response <span><math><mi>Y</mi></math></span> may be missing according to a missing-not-at-random (MNAR) setup, i.e., when the underlying missing probability mechanism can depend on both <span><math><mrow><mi>χ</mi></mrow></math></span> and <span><math><mi>Y</mi></math></span>. Our proposed estimator is based on a particular representation of the regression function <span><math><mrow><mi>m</mi><mrow><mo>(</mo><mi>χ</mi><mo>)</mo></mrow></mrow></math></span> in terms of four associated conditional expectations that can be estimated nonparametrically. To assess the theoretical performance of our estimators, we study their convergence properties in general <span><math><msup><mrow><mi>L</mi></mrow><mrow><mi>p</mi></mrow></msup></math></span> norms where we also look into their rates of convergence. Our numerical results show that the proposed estimators have good finite-sample performance. We also explore the applications of our results to the problem of statistical classification with missing labels and establish a number of convergence results for new kernel-type classification rules.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105497"},"PeriodicalIF":1.4,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficiency of Markov chains for Bayesian linear regression models with heavy-tailed errors","authors":"Yasuyuki Hamura","doi":"10.1016/j.jmva.2025.105506","DOIUrl":"10.1016/j.jmva.2025.105506","url":null,"abstract":"<div><div>In this paper, we consider posterior simulation for a linear regression model when the error distribution is given by a scale mixture of multivariate normals. We first show that a sampler given in the literature for the case of the conditionally conjugate normal-inverse Wishart prior continues to be geometrically ergodic even when the error density is heavier-tailed. Moreover, we prove that the ergodicity is uniform by verifying the minorization condition. In the second half of this note, we treat an improper case and, using a simple energy function, show that a data augmentation algorithm in the literature is geometrically ergodic under a significantly different condition.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105506"},"PeriodicalIF":1.4,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel Valiquette , Jean Peyhardi , Éric Marchand , Gwladys Toulemonde , Frédéric Mortier
{"title":"Tree Pólya Splitting distributions for multivariate count data","authors":"Samuel Valiquette , Jean Peyhardi , Éric Marchand , Gwladys Toulemonde , Frédéric Mortier","doi":"10.1016/j.jmva.2025.105507","DOIUrl":"10.1016/j.jmva.2025.105507","url":null,"abstract":"<div><div>In this article, we develop a new class of multivariate distributions adapted for count data, called Tree Pólya Splitting. This class results from the combination of a univariate distribution and singular multivariate distributions along a fixed partition tree. Known distributions, including the Dirichlet-multinomial, the generalized Dirichlet-multinomial and the Dirichlet-tree multinomial, are particular cases within this class. As we will demonstrate, these distributions offer some flexibility, allowing for the modeling of complex dependence structures (positive, negative, or null) at the observation level. Specifically, we present theoretical properties of Tree Pólya Splitting distributions by focusing primarily on marginal distributions, factorial moments, and dependence structures (covariance and correlations). A dataset of abundance of Trichoptera is used, on one hand, as a benchmark to illustrate the theoretical properties developed in this article, and on the other hand, to demonstrate the interest of these types of models, notably by comparing them to other approaches for fitting multivariate data, such as the Poisson-lognormal model in ecology or singular multivariate distributions used in microbial analysis.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105507"},"PeriodicalIF":1.4,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A unified selection consistency theorem for information criterion-based rank estimators in factor analysis","authors":"Toshinari Morimoto , Hung Hung , Su-Yun Huang","doi":"10.1016/j.jmva.2025.105498","DOIUrl":"10.1016/j.jmva.2025.105498","url":null,"abstract":"<div><div>Over the years, numerous rank estimators for factor models have been proposed in the literature. This article focuses on information criterion-based rank estimators and investigates their consistency in rank selection. The gap conditions serve as necessary and sufficient conditions for rank estimators to achieve selection consistency under the general assumptions of random matrix theory. We establish a unified theorem on selection consistency, presenting the gap conditions for information criterion-based rank estimators with a unified formulation.</div><div>To validate the theorem’s assertion that rank selection consistency is solely determined by the gap conditions, we conduct extensive numerical simulations across various settings. Additionally, we undertake supplementary simulations to explore the strengths and limitations of information criterion-based estimators by comparing them with other types of rank estimators.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105498"},"PeriodicalIF":1.4,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust semi-functional censored regression","authors":"Tao Wang","doi":"10.1016/j.jmva.2025.105491","DOIUrl":"10.1016/j.jmva.2025.105491","url":null,"abstract":"<div><div>This paper develops a robust methodological framework for analyzing randomly censored responses within the semi-functional partial linear regression models, utilizing the exponential squared loss criterion. The proposed methodology capitalizes on the robustness of the exponential squared loss function against outliers and heavy-tailed error distributions, while preserving the flexibility and interpretability of semi-functional regression, which accommodates scalar and functional predictors in a unified framework. To account for the divergent convergence rates of the parametric and nonparametric components, we introduce a novel three-step estimation procedure designed to enhance computational efficiency, ensure model robustness, and achieve asymptotically optimal estimation performance. The parametric component is estimated through a quasi-Newton algorithm, for which we establish global convergence under standard regularity conditions using a Wolfe-type line search strategy. Additionally, we suggest a cross-validation criterion based on the exponential squared loss function to guide the data-driven selection of tuning parameters. The theoretical properties, including consistency and asymptotic normality of the proposed estimators, are established under mild conditions. The efficacy and robustness of the method are demonstrated through a series of simulation studies and an empirical application to Alzheimer’s disease progression, highlighting its practical applicability in addressing complex and censored data structures.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105491"},"PeriodicalIF":1.4,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145027204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matérn and Generalized Wendland correlation models that parameterize hole effect, smoothness, and support","authors":"Xavier Emery , Moreno Bevilacqua , Emilio Porcu","doi":"10.1016/j.jmva.2025.105496","DOIUrl":"10.1016/j.jmva.2025.105496","url":null,"abstract":"<div><div>A huge literature in statistics and machine learning is devoted to parametric families of correlation functions, where the correlation parameters are used to understand the properties of an associated spatial random process in terms of smoothness and global or compact support. However, most of current parametric correlation functions attain only non-negative values. This work provides two new families of correlation functions that can have some negative values (aka hole effects), along with smoothness, and global or compact support. They generalize the celebrated Matérn and Generalized Wendland models, respectively, which are obtained as special cases. A link between the two new families is also established, showing that a specific reparameterization of the latter includes the former as a special limit case. Their performance in terms of estimation accuracy and goodness of best linear unbiased prediction is illustrated through synthetic and real data.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"211 ","pages":"Article 105496"},"PeriodicalIF":1.4,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rasoul Lotfi , Davood Shahsavani , Mohammad Arashi
{"title":"Classifying elliptically distributed observations using the Ledoit–Wolf shrinkage approach","authors":"Rasoul Lotfi , Davood Shahsavani , Mohammad Arashi","doi":"10.1016/j.jmva.2025.105495","DOIUrl":"10.1016/j.jmva.2025.105495","url":null,"abstract":"<div><div>Classifying observations by the method of linear discriminant analysis deals with two challenges. First, the observations may not follow a Gaussian distribution, Second, the covariance matrix is singular when the number of predictor variables exceeds the number of observations. In this article, we study the classification of high-dimensional elliptically distributed data in the framework of Bayesian approach, while using the Ledoit and Wolf’s shrinkage methodology to overcome the singularity of the covariance matrix. Also, a special case t-distribution is considered and the optimal shrinkage parameter is obtained. Furthermore, we evaluated the performance of the proposed estimators on synthetic and real data. Although the optimal shrinkage parameter does not necessarily provide the minimum test error rate, it can provide a solution to show the superiority of our proposed estimation versus some benchmark method.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"210 ","pages":"Article 105495"},"PeriodicalIF":1.4,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Posterior contraction and uncertainty quantification for the multivariate spike-and-slab LASSO","authors":"Yunyi Shen , Sameer K. Deshpande","doi":"10.1016/j.jmva.2025.105493","DOIUrl":"10.1016/j.jmva.2025.105493","url":null,"abstract":"<div><div>We study the asymptotic properties of Deshpande et al. (2019)’s multivariate spike-and-slab LASSO (mSSL) procedure for simultaneous variable and covariance selection in the sparse multivariate linear regression problem. In that problem, <span><math><mi>q</mi></math></span> correlated responses are regressed onto <span><math><mi>p</mi></math></span> covariates and the mSSL works by placing separate spike-and-slab priors on the entries in the matrix of marginal covariate effects and off-diagonal elements in the upper triangle of the residual precision matrix. Under mild assumptions about these matrices, we establish the posterior contraction rate for the mSSL posterior in the asymptotic regime where both <span><math><mi>p</mi></math></span> and <span><math><mi>q</mi></math></span> diverge with <span><math><mrow><mi>n</mi><mo>.</mo></mrow></math></span> By “de-biasing” the corresponding MAP estimates, we obtain confidence intervals for each covariate effect and residual partial correlation. In extensive simulation studies, these intervals displayed close-to-nominal frequentist coverage in finite sample settings but tended to be substantially longer than those obtained using a version of the Bayesian bootstrap that randomly re-weights the prior. We further show that the de-biased intervals for individual covariate effects are asymptotically valid.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"210 ","pages":"Article 105493"},"PeriodicalIF":1.4,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144866127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}