MDL for Causal Inference on Discrete Data

Kailash Budhathoki, Jilles Vreeken
{"title":"MDL for Causal Inference on Discrete Data","authors":"Kailash Budhathoki, Jilles Vreeken","doi":"10.1109/ICDM.2017.87","DOIUrl":null,"url":null,"abstract":"The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as the direction with the lowest Kolmogorov complexity. This notion is very powerful as it can detect any causal dependency that can be explained by a physical process. However, due to the halting problem, it is also not computable. In this paper we propose an computable instantiation that provably maintains the key aspects of the ideal. We propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, the score degrades gracefully, and we are still maximally able to detect dependencies between the marginal and the conditional distribution. As a proof of concept, we propose CISC, a linear-time algorithm for causal inference by stochastic complexity, for pairs of univariate discrete variables. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2017.87","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as the direction with the lowest Kolmogorov complexity. This notion is very powerful as it can detect any causal dependency that can be explained by a physical process. However, due to the halting problem, it is also not computable. In this paper we propose an computable instantiation that provably maintains the key aspects of the ideal. We propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, the score degrades gracefully, and we are still maximally able to detect dependencies between the marginal and the conditional distribution. As a proof of concept, we propose CISC, a linear-time algorithm for causal inference by stochastic complexity, for pairs of univariate discrete variables. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.
离散数据因果推理的MDL
算法马尔可夫条件表明,两个随机变量X和Y之间最可能的因果方向可以被识别为具有最低Kolmogorov复杂度的方向。这个概念非常强大,因为它可以检测到任何可以用物理过程解释的因果关系。然而,由于停机问题,它也是不可计算的。在本文中,我们提出了一个可计算的实例,可证明地维持理想的关键方面。我们建议通过最小描述长度(MDL)原则来近似Kolmogorov复杂度,使用与所考虑的模型类相关的最小最大最优分数。这意味着即使在对抗设置中,分数也会优雅地下降,我们仍然能够最大限度地检测到边缘分布和条件分布之间的依赖关系。作为概念的证明,我们提出了CISC,一种线性时间算法,用于单变量离散变量对的随机复杂性因果推理。实验表明,CISC在合成、基准和真实世界数据上都非常准确,在一定程度上优于目前的技术水平,并且在样本和领域大小方面扩展得非常好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信