{"title":"Comparing Robust Versions of Distance Covariance: A Comment on the Biloop Approach","authors":"Dominic Edelmann","doi":"10.1111/insr.70012","DOIUrl":null,"url":null,"abstract":"<p>I commend the authors for their significant contributions to the field of distance correlation presented in this paper. This work marks the first thorough investigation of the robustness properties of distance correlation, distance covariance, and distance standard deviation in terms of influence functions and breakdown points. These robustness aspects have previously caused considerable confusion in the literature, and this work will serve as an important and clarifying reference. Additionally, the authors introduce a novel version of distance covariance based on an innovative biloop transformation. Thanks to its redescending influence function, this version of distance covariance is robust to bivariate outliers and shows promise for real-world applications.</p><p>However, the paper does not fully capture the extensive literature on robust distance covariance measures. Criticism regarding the lack of robustness in classical distance covariance can be traced back to the 2009 discussion paper of Székely and Rizzo, in which Bruno Rémillard (<span>2009</span>) identifies the moment assumption as a weakness of distance covariance and suggests a rank transformation as a remedy. In the same discussion, Gretton <i>et al.</i> (<span>2009</span>) give a brief historic overview and present several robust distance covariance type statistics (Kankainen, <span>1995</span>; Feuerverger, <span>1993</span>).</p><p>In this comment, I offer a concise, albeit non-comprehensive, overview of robust versions of distance covariance. Moreover, I extend the simulation studies presented in the main paper, giving further insights into the properties of the newly proposed biloop distance covariance.</p><p>The literature offers at least three different approaches to defining extensions of distance covariance. Each approach gives rise to dependence measures with positive breakdown points.</p><p>The simulations presented below extend those in the main paper to shed further light on the properties of biloop distance covariance. Since the results of the normal score distance covariance were very similar to the rank distance covariance, the former method has been omitted. Instead the distance covariance \n<span></span><math>\n <semantics>\n <mrow>\n <mspace></mspace>\n <msub>\n <mtext>dCov</mtext>\n <mi>d</mi>\n </msub>\n </mrow>\n <annotation>$$ \\kern0.1em {\\mathrm{dCov}}_d $$</annotation>\n </semantics></math> in Equation (2) employing the RBF distance (Equation 2) is included; this method is denoted as <i>RBF distance covariance</i> in the following. The hyperparameter \n<span></span><math>\n <semantics>\n <mrow>\n <mi>c</mi>\n </mrow>\n <annotation>$$ c $$</annotation>\n </semantics></math> for the biloop distance covariance was selected as in the main paper, while the bandwidth for the RBF distance covariance was set using the median heuristic (Fukumizu <i>et al.</i>, <span>2009</span>). A wider range of sample sizes (\n<span></span><math>\n <semantics>\n <mrow>\n <mi>n</mi>\n <mo>=</mo>\n <mn>50</mn>\n <mo>,</mo>\n <mo> </mo>\n <mn>100</mn>\n <mo>,</mo>\n <mo> </mo>\n <mn>200</mn>\n <mo>,</mo>\n <mo> </mo>\n <mn>400</mn>\n <mo>,</mo>\n <mspace></mspace>\n <mn>800</mn>\n <mo>,</mo>\n <mo> </mo>\n <mn>1</mn>\n <mo> </mo>\n <mn>600</mn>\n <mo>,</mo>\n <mspace></mspace>\n <mn>3</mn>\n <mo> </mo>\n <mn>200</mn>\n </mrow>\n <annotation>$$ n&amp;#x0003D;50,100,200,400,\\kern0.5em 800,1\\ 600,\\kern0.5em 3\\ 200 $$</annotation>\n </semantics></math>) than in the main paper was examined in each scenario. The bivariate data \n<span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mfenced>\n <msub>\n <mi>X</mi>\n <mn>1</mn>\n </msub>\n <msub>\n <mi>Y</mi>\n <mn>1</mn>\n </msub>\n </mfenced>\n <mi>t</mi>\n </msup>\n <mo>,</mo>\n <msup>\n <mfenced>\n <msub>\n <mi>X</mi>\n <mn>2</mn>\n </msub>\n <msub>\n <mi>Y</mi>\n <mn>2</mn>\n </msub>\n </mfenced>\n <mi>t</mi>\n </msup>\n <mo>,</mo>\n <mo>…</mo>\n <mo>,</mo>\n <msup>\n <mfenced>\n <msub>\n <mi>X</mi>\n <mi>n</mi>\n </msub>\n <msub>\n <mi>Y</mi>\n <mi>n</mi>\n </msub>\n </mfenced>\n <mi>t</mi>\n </msup>\n </mrow>\n <annotation>$$ {\\left({X}_1,{Y}_1\\right)}&amp;#x0005E;t,{\\left({X}_2,{Y}_2\\right)}&amp;#x0005E;t,\\dots, {\\left({X}_n,{Y}_n\\right)}&amp;#x0005E;t $$</annotation>\n </semantics></math> are always independently generated. I focus on the problem of independence testing. The tests are based on \n<span></span><math>\n <semantics>\n <mrow>\n <mi>K</mi>\n <mo>=</mo>\n <mn>500</mn>\n </mrow>\n <annotation>$$ K&amp;#x0003D;500 $$</annotation>\n </semantics></math> permutations, empirical rejection rates are determined from \n<span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>sim</mi>\n </msub>\n <mo>=</mo>\n <mn>1</mn>\n <mo> </mo>\n <mn>000</mn>\n </mrow>\n <annotation>$$ {N}_{sim}&amp;#x0003D;1\\ 000 $$</annotation>\n </semantics></math> simulations and a nominal level of \n<span></span><math>\n <semantics>\n <mrow>\n <mi>α</mi>\n <mo>=</mo>\n <mn>0.05</mn>\n </mrow>\n <annotation>$$ \\alpha &amp;#x0003D;0.05 $$</annotation>\n </semantics></math> is used. All simulations were performed using the R package dcortools available on the Comprehensive R Archive Network (CRAN).</p><p>From my perspective, distance covariance methods hold great promise for the analysis of high-dimensional data. When testing a large number of bivariate dependencies, distance covariance provides a computationally efficient omnibus independence test—crucial in settings where visual inspection is impossible and where classical correlation measures may miss nonlinear relationships. Likewise, because outliers can lurk undetected in high dimensions, it is often advisable to employ a robust variant that automatically down-weights or excludes extreme observations.</p><p>Several robust variants of distance covariance have been proposed; among them, the biloop distance correlation stands out because its redescending influence function substantially diminishes the effect of any observation whose marginal distance is far from the median. As demonstrated in this comment and the main paper, this property makes the biloop method robust to outliers, even when these outliers induce a monotone dependence. In scenarios with only univariate outliers or bivariate outliers that do not induce monotone dependence, its performance closely matches that of rank-based distance covariance. A possible caveat is that, by construction, the biloop distance covariance is virtually blind to certain very strong dependence patterns (see Figure 3). Thus, while the method is promising, we must carefully weigh its robustness benefits against potential losses in power for specific dependence structures. Targeted simulations and theoretical analyses are needed to precisely quantify this trade-off.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"94 1","pages":"26-34"},"PeriodicalIF":1.8000,"publicationDate":"2026-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.70012","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Statistical Review","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/insr.70012","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/2 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
I commend the authors for their significant contributions to the field of distance correlation presented in this paper. This work marks the first thorough investigation of the robustness properties of distance correlation, distance covariance, and distance standard deviation in terms of influence functions and breakdown points. These robustness aspects have previously caused considerable confusion in the literature, and this work will serve as an important and clarifying reference. Additionally, the authors introduce a novel version of distance covariance based on an innovative biloop transformation. Thanks to its redescending influence function, this version of distance covariance is robust to bivariate outliers and shows promise for real-world applications.
However, the paper does not fully capture the extensive literature on robust distance covariance measures. Criticism regarding the lack of robustness in classical distance covariance can be traced back to the 2009 discussion paper of Székely and Rizzo, in which Bruno Rémillard (2009) identifies the moment assumption as a weakness of distance covariance and suggests a rank transformation as a remedy. In the same discussion, Gretton et al. (2009) give a brief historic overview and present several robust distance covariance type statistics (Kankainen, 1995; Feuerverger, 1993).
In this comment, I offer a concise, albeit non-comprehensive, overview of robust versions of distance covariance. Moreover, I extend the simulation studies presented in the main paper, giving further insights into the properties of the newly proposed biloop distance covariance.
The literature offers at least three different approaches to defining extensions of distance covariance. Each approach gives rise to dependence measures with positive breakdown points.
The simulations presented below extend those in the main paper to shed further light on the properties of biloop distance covariance. Since the results of the normal score distance covariance were very similar to the rank distance covariance, the former method has been omitted. Instead the distance covariance
in Equation (2) employing the RBF distance (Equation 2) is included; this method is denoted as RBF distance covariance in the following. The hyperparameter
for the biloop distance covariance was selected as in the main paper, while the bandwidth for the RBF distance covariance was set using the median heuristic (Fukumizu et al., 2009). A wider range of sample sizes (
) than in the main paper was examined in each scenario. The bivariate data
are always independently generated. I focus on the problem of independence testing. The tests are based on
permutations, empirical rejection rates are determined from
simulations and a nominal level of
is used. All simulations were performed using the R package dcortools available on the Comprehensive R Archive Network (CRAN).
From my perspective, distance covariance methods hold great promise for the analysis of high-dimensional data. When testing a large number of bivariate dependencies, distance covariance provides a computationally efficient omnibus independence test—crucial in settings where visual inspection is impossible and where classical correlation measures may miss nonlinear relationships. Likewise, because outliers can lurk undetected in high dimensions, it is often advisable to employ a robust variant that automatically down-weights or excludes extreme observations.
Several robust variants of distance covariance have been proposed; among them, the biloop distance correlation stands out because its redescending influence function substantially diminishes the effect of any observation whose marginal distance is far from the median. As demonstrated in this comment and the main paper, this property makes the biloop method robust to outliers, even when these outliers induce a monotone dependence. In scenarios with only univariate outliers or bivariate outliers that do not induce monotone dependence, its performance closely matches that of rank-based distance covariance. A possible caveat is that, by construction, the biloop distance covariance is virtually blind to certain very strong dependence patterns (see Figure 3). Thus, while the method is promising, we must carefully weigh its robustness benefits against potential losses in power for specific dependence structures. Targeted simulations and theoretical analyses are needed to precisely quantify this trade-off.
期刊介绍:
International Statistical Review is the flagship journal of the International Statistical Institute (ISI) and of its family of Associations. It publishes papers of broad and general interest in statistics and probability. The term Review is to be interpreted broadly. The types of papers that are suitable for publication include (but are not limited to) the following: reviews/surveys of significant developments in theory, methodology, statistical computing and graphics, statistical education, and application areas; tutorials on important topics; expository papers on emerging areas of research or application; papers describing new developments and/or challenges in relevant areas; papers addressing foundational issues; papers on the history of statistics and probability; white papers on topics of importance to the profession or society; and historical assessment of seminal papers in the field and their impact.