{"title":"Adaptive directional estimator of the density in Rd for independent and mixing sequences","authors":"Sinda Ammous , Jérôme Dedecker , Céline Duval","doi":"10.1016/j.jmva.2024.105332","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105332","url":null,"abstract":"<div><p>A new multivariate density estimator for stationary sequences is obtained by Fourier inversion of the thresholded empirical characteristic function. This estimator does not depend on the choice of parameters related to the smoothness of the density; it is directly adaptive. We establish oracle inequalities valid for independent, <span><math><mi>α</mi></math></span>-mixing and <span><math><mi>τ</mi></math></span>-mixing sequences, which allows us to derive optimal convergence rates, up to a logarithmic loss. On general anisotropic Sobolev classes, the estimator adapts to the regularity of the unknown density but also achieves directional adaptivity. More precisely, the estimator is able to reach the convergence rate induced by the <em>best</em> Sobolev regularity of the density of <span><math><mrow><mi>A</mi><mi>X</mi></mrow></math></span>, where <span><math><mi>A</mi></math></span> belongs to a class of invertible matrices describing all the possible directions. The estimator is easy to implement and numerically efficient. It depends on the calibration of a parameter for which we propose an innovative numerical selection procedure, using the Euler characteristic of the thresholded areas.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141290044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ordinal pattern dependence and multivariate measures of dependence","authors":"Angelika Silbernagel, Alexander Schnurr","doi":"10.1016/j.jmva.2024.105337","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105337","url":null,"abstract":"<div><p>Ordinal pattern dependence has been introduced in order to capture co-monotonic behavior between two time series. This concept has several features one would intuitively demand from a dependence measure. It was believed that ordinal pattern dependence satisfies the axioms which Grothe et al. (2014) proclaimed for a multivariate measure of dependence. In the present article we show that this is not true and that there is a mistake in the article by Betken et al. (2021). Furthermore, we show that ordinal pattern dependence satisfies a slightly modified set of axioms.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000447/pdfft?md5=1cb9743828786dd1e4dbfb081a6f213d&pid=1-s2.0-S0047259X24000447-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parametric dependence between random vectors via copula-based divergence measures","authors":"Steven De Keyser, Irène Gijbels","doi":"10.1016/j.jmva.2024.105336","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105336","url":null,"abstract":"<div><p>This article proposes copula-based dependence quantification between multiple groups of random variables of possibly different sizes via the family of <span><math><mi>Φ</mi></math></span>-divergences. An axiomatic framework for this purpose is provided, after which we focus on the absolutely continuous setting assuming copula densities exist. We consider parametric and semi-parametric frameworks, discuss estimation procedures, and report on asymptotic properties of the proposed estimators. In particular, we first concentrate on a Gaussian copula approach yielding explicit and attractive dependence coefficients for specific choices of <span><math><mi>Φ</mi></math></span>, which are more amenable for estimation. Next, general parametric copula families are considered, with special attention to nested Archimedean copulas, being a natural choice for dependence modelling of random vectors. The results are illustrated by means of examples. Simulations and a real-world application on financial data are provided as well.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141239837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensor recovery in high-dimensional Ising models","authors":"Tianyu Liu , Somabha Mukherjee , Rahul Biswas","doi":"10.1016/j.jmva.2024.105335","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105335","url":null,"abstract":"<div><p>The <span><math><mi>k</mi></math></span>-tensor Ising model is a multivariate exponential family on a <span><math><mi>p</mi></math></span>-dimensional binary hypercube for modeling dependent binary data, where the sufficient statistic consists of all <span><math><mi>k</mi></math></span>-fold products of the observations, and the parameter is an unknown <span><math><mi>k</mi></math></span>-fold tensor, designed to capture higher-order interactions between the binary variables. In this paper, we describe an approach based on a penalization technique that helps us recover the signed support of the tensor parameter with high probability, assuming that no entry of the true tensor is too close to zero. The method is based on an <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-regularized node-wise logistic regression, that recovers the signed neighborhood of each node with high probability. Our analysis is carried out in the high-dimensional regime, that allows the dimension <span><math><mi>p</mi></math></span> of the Ising model, as well as the interaction factor <span><math><mi>k</mi></math></span> to potentially grow to <span><math><mi>∞</mi></math></span> with the sample size <span><math><mi>n</mi></math></span>. We show that if the minimum interaction strength is not too small, then consistent recovery of the entire signed support is possible if one takes <span><math><mrow><mi>n</mi><mo>=</mo><mi>Ω</mi><mrow><mo>(</mo><msup><mrow><mrow><mo>(</mo><mi>k</mi><mo>!</mo><mo>)</mo></mrow></mrow><mrow><mn>8</mn></mrow></msup><msup><mrow><mi>d</mi></mrow><mrow><mn>3</mn></mrow></msup><mo>log</mo><mfenced><mrow><mfrac><mrow><mi>p</mi><mo>−</mo><mn>1</mn></mrow><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></mfrac></mrow></mfenced><mo>)</mo></mrow></mrow></math></span> samples, where <span><math><mi>d</mi></math></span> denotes the maximum degree of the hypernetwork in question. Our results are validated in two simulation settings, and applied on a real neurobiological dataset consisting of multi-array electro-physiological recordings from the mouse visual cortex, to model higher-order interactions between the brain regions.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141164340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distribution-on-distribution regression with Wasserstein metric: Multivariate Gaussian case","authors":"Ryo Okano , Masaaki Imaizumi","doi":"10.1016/j.jmva.2024.105334","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105334","url":null,"abstract":"<div><p>Distribution data refer to a data set in which each sample is represented as a probability distribution, a subject area that has received increasing interest in the field of statistics. Although several studies have developed distribution-to-distribution regression models for univariate variables, the multivariate scenario remains under-explored due to technical complexities. In this study, we introduce models for regression from one Gaussian distribution to another, using the Wasserstein metric. These models are constructed using the geometry of the Wasserstein space, which enables the transformation of Gaussian distributions into components of a linear matrix space. Owing to their linear regression frameworks, our models are intuitively understandable, and their implementation is simplified because of the optimal transport problem’s analytical solution between Gaussian distributions. We also explore a generalization of our models to encompass non-Gaussian scenarios. We establish the convergence rates of in-sample prediction errors for the empirical risk minimizations in our models. In comparative simulation experiments, our models demonstrate superior performance over a simpler alternative method that transforms Gaussian distributions into matrices. We present an application of our methodology using weather data for illustration purposes.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000411/pdfft?md5=dea43975f3758fd74adfc88e822be366&pid=1-s2.0-S0047259X24000411-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141239836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse subspace clustering in diverse multiplex network model","authors":"Majid Noroozi , Marianna Pensky","doi":"10.1016/j.jmva.2024.105333","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105333","url":null,"abstract":"<div><p>The paper considers the DIverse MultiPLEx (DIMPLE) network model, where all layers of the network have the same collection of nodes and are equipped with the Stochastic Block Models. In addition, all layers can be partitioned into groups with the same community structures, although the layers in the same group may have different matrices of block connection probabilities. To the best of our knowledge, the DIMPLE model, introduced in Pensky and Wang (2021), presents the most broad SBM-equipped binary multilayer network model on the same set of nodes and, thus, generalizes a multitude of papers that study more restrictive settings. Under the DIMPLE model, the main task is to identify the groups of layers with the same community structures since the matrices of block connection probabilities act as nuisance parameters under the DIMPLE paradigm. The main contribution of the paper is achieving the strongly consistent between-layer clustering by using Sparse Subspace Clustering (SSC), the well-developed technique in computer vision. In addition, SSC allows to handle much larger networks than spectral clustering, and is perfectly suitable for application of parallel computing. Moreover, our paper is the first one to obtain precision guarantees for SSC when it is applied to binary data.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141095842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Mai–Wang stochastic decomposition for ℓp-norm symmetric survival functions on the positive orthant","authors":"Christian Genest , Johanna G. Nešlehová","doi":"10.1016/j.jmva.2024.105331","DOIUrl":"10.1016/j.jmva.2024.105331","url":null,"abstract":"<div><p>Recently, Mai and Wang (2021) investigated a class of <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mi>p</mi></mrow></msub></math></span>-norm symmetric survival functions on the positive orthant. In their paper, they claim that the generator of these functions must be <span><math><mi>d</mi></math></span>-monotone. This note explains that this is not true in general. Luckily, most of the results in Mai and Wang (2021) are not affected by this oversight.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0047259X24000381/pdfft?md5=f0a3613b1587ac23eed097d6f63a0a06&pid=1-s2.0-S0047259X24000381-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141028268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tuning-free sparse clustering via alternating hard-thresholding","authors":"Wei Dong , Chen Xu , Jinhan Xie , Niansheng Tang","doi":"10.1016/j.jmva.2024.105330","DOIUrl":"10.1016/j.jmva.2024.105330","url":null,"abstract":"<div><p>Model-based clustering is a commonly-used technique to partition heterogeneous data into homogeneous groups. When the analysis is to be conducted with a large number of features, analysts face simultaneous challenges in model interpretability, clustering accuracy, and computational efficiency. Several Bayesian and penalization methods have been proposed to select important features for model-based clustering. However, the performance of those methods relies on a careful algorithmic tuning, which can be time-consuming for high-dimensional cases. In this paper, we propose a new sparse clustering method based on alternating hard-thresholding. The new method is conceptually simple and tuning-free. With a user-specified sparsity level, it efficiently detects a set of key features by eliminating a large number of features that are less useful for clustering. Based on the selected key features, one can readily obtain an effective clustering of the original high-dimensional data under a general sparse covariance structure. Under mild conditions, we show that the new method leads to clusters with a misclassification rate consistent to the optimal rate as if the underlying true model were used. The promising performance of the new method is supported by both simulated and real data examples.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141050885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian inference of graph-based dependencies from mixed-type data","authors":"Chiara Galimberti , Stefano Peluso , Federico Castelletti","doi":"10.1016/j.jmva.2024.105323","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105323","url":null,"abstract":"<div><p>Mixed data comprise measurements of different types, with both categorical and continuous variables, and can be found in various areas, such as in life science or industrial processes. Inferring conditional independencies from the data is crucial to understand how these variables relate to each other. To this end, graphical models provide an effective framework, which adopts a graph-based representation of the joint distribution to encode such dependence relations. This framework has been extensively studied in the Gaussian and categorical settings separately; on the other hand, the literature addressing this problem in presence of mixed data is still narrow. We propose a Bayesian model for the analysis of mixed data based on the notion of Conditional Gaussian (CG) distribution. Our method is based on a canonical parameterization of the CG distribution, which allows for posterior inference of parameters indexing the (marginal) distributions of continuous and categorical variables, as well as expressing the interactions between the two types of variables. We derive the limiting Gaussian distributions, centered on the correct unknown value and with vanishing variance, for the Bayesian estimators of the canonical parameters expressing continuous, discrete and mixed interactions. In addition, we implement the proposed method for structure learning purposes, namely to infer the underlying graph of conditional independencies. When compared to alternative frequentist methods, our approach shows favorable results both in a simulation setting and in real-data applications, besides allowing for a coherent uncertainty quantification around parameter estimates.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced Laplace approximation","authors":"Jeongseop Han, Youngjo Lee","doi":"10.1016/j.jmva.2024.105321","DOIUrl":"https://doi.org/10.1016/j.jmva.2024.105321","url":null,"abstract":"<div><p>The Laplace approximation has been proposed as a method for approximating the marginal likelihood of statistical models with latent variables. However, the approximate maximum likelihood estimators derived from the Laplace approximation are often biased for binary or temporally and/or spatially correlated data. Additionally, the corresponding Hessian matrix tends to underestimates the standard errors of these approximate maximum likelihood estimators. While higher-order approximations have been suggested, they are not applicable to complex models, such as correlated random effects models, and fail to provide consistent variance estimators. In this paper, we propose an enhanced Laplace approximation that provides the true maximum likelihood estimator and its consistent variance estimator. We study its relationship with the variational Bayes method. We also define a new restricted maximum likelihood estimator for estimating dispersion parameters and study their asymptotic properties. Enhanced Laplace approximation generally demonstrates how to obtain the true restricted maximum likelihood estimators and their variance estimators. Our numerical studies indicate that the enhanced Laplace approximation provides a satisfactory maximum likelihood estimator and restricted maximum likelihood estimator, as well as their variance estimators in the frequentist perspective. The maximum likelihood estimator and restricted maximum likelihood estimator can be also interpreted as the posterior mode and marginal posterior mode under flat priors, respectively. Furthermore, we present some comparisons with Bayesian procedures under different priors.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140807251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}