Blanca E. Monroy-Castillo, M. A, Jácome, Ricardo Cao
{"title":"改进距离相关性估计","authors":"Blanca E. Monroy-Castillo, M. A, Jácome, Ricardo Cao","doi":"arxiv-2405.01958","DOIUrl":null,"url":null,"abstract":"Distance correlation is a novel class of multivariate dependence measure,\ntaking positive values between 0 and 1, and applicable to random vectors of\narbitrary dimensions, not necessarily equal. It offers several advantages over\nthe well-known Pearson correlation coefficient, the most important is that\ndistance correlation equals zero if and only if the random vectors are\nindependent. There are two different estimators of the distance correlation available in\nthe literature. The first one, proposed by Sz\\'ekely et al. (2007), is based on\nan asymptotically unbiased estimator of the distance covariance which turns out\nto be a V-statistic. The second one builds on an unbiased estimator of the\ndistance covariance proposed in Sz\\'ekely et al. (2014), proved to be an\nU-statistic by Sz\\'ekely and Huo (2016). This study evaluates their efficiency\n(mean squared error) and compares computational times for both methods under\ndifferent dependence structures. Under conditions of independence or\nnear-independence, the V-estimates are biased, while the U-estimator frequently\ncannot be computed due to negative values. To address this challenge, a convex\nlinear combination of the former estimators is proposed and studied, yielding\ngood results regardless of the level of dependence.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved distance correlation estimation\",\"authors\":\"Blanca E. Monroy-Castillo, M. A, Jácome, Ricardo Cao\",\"doi\":\"arxiv-2405.01958\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distance correlation is a novel class of multivariate dependence measure,\\ntaking positive values between 0 and 1, and applicable to random vectors of\\narbitrary dimensions, not necessarily equal. It offers several advantages over\\nthe well-known Pearson correlation coefficient, the most important is that\\ndistance correlation equals zero if and only if the random vectors are\\nindependent. There are two different estimators of the distance correlation available in\\nthe literature. The first one, proposed by Sz\\\\'ekely et al. (2007), is based on\\nan asymptotically unbiased estimator of the distance covariance which turns out\\nto be a V-statistic. The second one builds on an unbiased estimator of the\\ndistance covariance proposed in Sz\\\\'ekely et al. (2014), proved to be an\\nU-statistic by Sz\\\\'ekely and Huo (2016). This study evaluates their efficiency\\n(mean squared error) and compares computational times for both methods under\\ndifferent dependence structures. Under conditions of independence or\\nnear-independence, the V-estimates are biased, while the U-estimator frequently\\ncannot be computed due to negative values. To address this challenge, a convex\\nlinear combination of the former estimators is proposed and studied, yielding\\ngood results regardless of the level of dependence.\",\"PeriodicalId\":501330,\"journal\":{\"name\":\"arXiv - MATH - Statistics Theory\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Statistics Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.01958\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.01958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
距离相关性是一类新的多元依赖性度量,其正值介于 0 和 1 之间,适用于任意维度的随机向量,不一定相等。与众所周知的皮尔逊相关系数相比,它有几个优点,其中最重要的是,如果且仅如果随机向量是独立的,则距离相关性等于零。文献中有两种不同的距离相关性估计值。第一个是 Sz\'ekely 等人(2007 年)提出的,它基于距离协方差的渐近无偏估计值,该估计值被证明是一个 V 统计量。第二个估计是基于 Sz\'ekely 等人(2014 年)提出的距离协方差无偏估计,Sz\'ekely 和 Huo(2016 年)证明它是一个 U 统计量。本研究评估了这两种方法的效率(均方误差),并比较了这两种方法在不同依赖结构下的计算时间。在独立或近似独立的条件下,V估计值是有偏差的,而U估计值经常由于负值而无法计算。为了解决这一难题,我们提出并研究了前两种估计方法的凸线性组合,无论依赖程度如何,都能获得良好的结果。
Distance correlation is a novel class of multivariate dependence measure,
taking positive values between 0 and 1, and applicable to random vectors of
arbitrary dimensions, not necessarily equal. It offers several advantages over
the well-known Pearson correlation coefficient, the most important is that
distance correlation equals zero if and only if the random vectors are
independent. There are two different estimators of the distance correlation available in
the literature. The first one, proposed by Sz\'ekely et al. (2007), is based on
an asymptotically unbiased estimator of the distance covariance which turns out
to be a V-statistic. The second one builds on an unbiased estimator of the
distance covariance proposed in Sz\'ekely et al. (2014), proved to be an
U-statistic by Sz\'ekely and Huo (2016). This study evaluates their efficiency
(mean squared error) and compares computational times for both methods under
different dependence structures. Under conditions of independence or
near-independence, the V-estimates are biased, while the U-estimator frequently
cannot be computed due to negative values. To address this challenge, a convex
linear combination of the former estimators is proposed and studied, yielding
good results regardless of the level of dependence.