Distinguishing Cause from Effect on Categorical Data: The Uniform Channel Model

CLEaR Pub Date : 2023-03-14 DOI:10.48550/arXiv.2303.08572

Mário A. T. Figueiredo, Catarina A. Oliveira

{"title":"Distinguishing Cause from Effect on Categorical Data: The Uniform Channel Model","authors":"Mário A. T. Figueiredo, Catarina A. Oliveira","doi":"10.48550/arXiv.2303.08572","DOIUrl":null,"url":null,"abstract":"Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.","PeriodicalId":171742,"journal":{"name":"CLEaR","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CLEaR","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2303.08572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.

查看原文本刊更多论文

分类数据的因果辨析:统一渠道模型

利用一对随机变量的观测值区分因果关系是因果发现的核心问题。为这项任务提出的大多数方法，即加性噪声模型(ANM)，仅适用于定量数据。我们提出了一个标准来解决分类变量(生活在没有有意义的顺序的集合中)的因果问题，灵感来自于将条件概率质量函数(pmf)视为离散的无记忆通道。我们选择最可能的因果方向是条件pmf更接近统一通道(UC)的方向。基本原理是，在UC中，就像在ANM中一样，条件熵(给定原因的结果)独立于原因分布，符合原因和机制独立的原则。我们的方法，我们称之为统一通道模型(UCM)，因此将ANM的基本原理扩展到分类变量。为了评估条件pmf(从数据估计)与UC的接近程度，我们使用了统计测试，该测试由UC通道的封闭形式估计支持。在理论方面，我们证明了UCM的可识别性，并证明了其与具有低基数外生变量的结构因果模型的等价性。最后，所提出的方法在合成，基准和实际数据的实验中与最近的最先进的替代方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CLEaR

自引率

0.00%

发文量