The Berkelmans–Pries dependency function: A generic measure of dependence between random variables

IF 0.7 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Applied Probability Pub Date : 2023-03-23 DOI:10.1017/jpr.2022.118

Guus Berkelmans, S. Bhulai, R. D. van der Mei, Joris Pries

{"title":"The Berkelmans–Pries dependency function: A generic measure of dependence between random variables","authors":"Guus Berkelmans, S. Bhulai, R. D. van der Mei, Joris Pries","doi":"10.1017/jpr.2022.118","DOIUrl":null,"url":null,"abstract":"\n Measuring and quantifying dependencies between random variables (RVs) can give critical insights into a dataset. Typical questions are: ‘Do underlying relationships exist?’, ‘Are some variables redundant?’, and ‘Is some target variable Y highly or weakly dependent on variable X?’ Interestingly, despite the evident need for a general-purpose measure of dependency between RVs, common practice is that most data analysts use the Pearson correlation coefficient to quantify dependence between RVs, while it is recognized that the correlation coefficient is essentially a measure for linear dependency only. Although many attempts have been made to define more generic dependency measures, there is no consensus yet on a standard, general-purpose dependency function. In fact, several ideal properties of a dependency function have been proposed, but without much argumentation. Motivated by this, we discuss and revise the list of desired properties and propose a new dependency function that meets all these requirements. This general-purpose dependency function provides data analysts with a powerful means to quantify the level of dependence between variables. To this end, we also provide Python code to determine the dependency function for use in practice.","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":" ","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Probability","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1017/jpr.2022.118","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Measuring and quantifying dependencies between random variables (RVs) can give critical insights into a dataset. Typical questions are: ‘Do underlying relationships exist?’, ‘Are some variables redundant?’, and ‘Is some target variable Y highly or weakly dependent on variable X?’ Interestingly, despite the evident need for a general-purpose measure of dependency between RVs, common practice is that most data analysts use the Pearson correlation coefficient to quantify dependence between RVs, while it is recognized that the correlation coefficient is essentially a measure for linear dependency only. Although many attempts have been made to define more generic dependency measures, there is no consensus yet on a standard, general-purpose dependency function. In fact, several ideal properties of a dependency function have been proposed, but without much argumentation. Motivated by this, we discuss and revise the list of desired properties and propose a new dependency function that meets all these requirements. This general-purpose dependency function provides data analysts with a powerful means to quantify the level of dependence between variables. To this end, we also provide Python code to determine the dependency function for use in practice.

查看原文本刊更多论文

Berkelmans-Pries相关性函数:随机变量之间相关性的一般度量

测量和量化随机变量（RV）之间的相关性可以为数据集提供关键的见解。典型的问题是：“潜在的关系存在吗？”有些变量是多余的吗？’，和“某个目标变量Y高度或弱依赖于变量X吗？”有趣的是，尽管明显需要对RV之间的相关性进行通用测量，但通常的做法是，大多数数据分析师使用Pearson相关系数来量化RV之间的依赖性，而相关系数本质上只是线性相关性的测量。尽管已经进行了许多尝试来定义更通用的依赖性度量，但对于标准的通用依赖性函数还没有达成共识。事实上，依赖函数的几个理想性质已经被提出，但没有太多的论证。受此启发，我们讨论并修改了所需属性的列表，并提出了一个满足所有这些要求的新依赖函数。这种通用的依赖函数为数据分析师提供了一种强大的手段来量化变量之间的依赖程度。为此，我们还提供了Python代码来确定在实践中使用的依赖函数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Applied Probability 数学-统计学与概率论

CiteScore

1.50

自引率

10.00%

发文量

审稿时长

6-12 weeks

期刊介绍： Journal of Applied Probability is the oldest journal devoted to the publication of research in the field of applied probability. It is an international journal published by the Applied Probability Trust, and it serves as a companion publication to the Advances in Applied Probability. Its wide audience includes leading researchers across the entire spectrum of applied probability, including biosciences applications, operations research, telecommunications, computer science, engineering, epidemiology, financial mathematics, the physical and social sciences, and any field where stochastic modeling is used. A submission to Applied Probability represents a submission that may, at the Editor-in-Chief’s discretion, appear in either the Journal of Applied Probability or the Advances in Applied Probability. Typically, shorter papers appear in the Journal, with longer contributions appearing in the Advances.