Blanca E. Monroy-Castillo, M. A, Jácome, Ricardo Cao
{"title":"Improved distance correlation estimation","authors":"Blanca E. Monroy-Castillo, M. A, Jácome, Ricardo Cao","doi":"arxiv-2405.01958","DOIUrl":null,"url":null,"abstract":"Distance correlation is a novel class of multivariate dependence measure,\ntaking positive values between 0 and 1, and applicable to random vectors of\narbitrary dimensions, not necessarily equal. It offers several advantages over\nthe well-known Pearson correlation coefficient, the most important is that\ndistance correlation equals zero if and only if the random vectors are\nindependent. There are two different estimators of the distance correlation available in\nthe literature. The first one, proposed by Sz\\'ekely et al. (2007), is based on\nan asymptotically unbiased estimator of the distance covariance which turns out\nto be a V-statistic. The second one builds on an unbiased estimator of the\ndistance covariance proposed in Sz\\'ekely et al. (2014), proved to be an\nU-statistic by Sz\\'ekely and Huo (2016). This study evaluates their efficiency\n(mean squared error) and compares computational times for both methods under\ndifferent dependence structures. Under conditions of independence or\nnear-independence, the V-estimates are biased, while the U-estimator frequently\ncannot be computed due to negative values. To address this challenge, a convex\nlinear combination of the former estimators is proposed and studied, yielding\ngood results regardless of the level of dependence.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.01958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Distance correlation is a novel class of multivariate dependence measure,
taking positive values between 0 and 1, and applicable to random vectors of
arbitrary dimensions, not necessarily equal. It offers several advantages over
the well-known Pearson correlation coefficient, the most important is that
distance correlation equals zero if and only if the random vectors are
independent. There are two different estimators of the distance correlation available in
the literature. The first one, proposed by Sz\'ekely et al. (2007), is based on
an asymptotically unbiased estimator of the distance covariance which turns out
to be a V-statistic. The second one builds on an unbiased estimator of the
distance covariance proposed in Sz\'ekely et al. (2014), proved to be an
U-statistic by Sz\'ekely and Huo (2016). This study evaluates their efficiency
(mean squared error) and compares computational times for both methods under
different dependence structures. Under conditions of independence or
near-independence, the V-estimates are biased, while the U-estimator frequently
cannot be computed due to negative values. To address this challenge, a convex
linear combination of the former estimators is proposed and studied, yielding
good results regardless of the level of dependence.