混合数据的非线性因果结构学习

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI:10.1109/ICDM51629.2021.00082

Wenjuan Wei, Lu Feng

{"title":"混合数据的非线性因果结构学习","authors":"Wenjuan Wei, Lu Feng","doi":"10.1109/ICDM51629.2021.00082","DOIUrl":null,"url":null,"abstract":"Causal discovery from observational data is a fundamental problem. A large number of algorithms have been proposed over the years for that purpose, but they usually handle the data of a single type, either continuous or discrete variables only. Recently, a few causal structure discovery algorithms have been developed for mixed data types, and received many applications. In this paper, we propose a structural equation model for mixed data types, which allows the causal mechanisms to be nonlinear and can consequently model many read-world situations. We prove that the causal structure is identifiable from the data distribution generated by the model under certain conditions. Moreover, we propose a maximum likelihood estimator and develop an efficient order search algorithm benefiting from a novel method of order space cutting, which can handle several hundred variables. We adopt automatic relevance determination kernel-based variable selection after order learning to recover the causal structure. Experiments on synthetic datasets demonstrate the accuracy and scalability of our approach. Especially, we apply our method to publicly available causal-effect pairs and show its superiority in the causal direction identification of mixed causal pairs. In addition, we show that our method can sensibly recover causal relationships on a publicly available real dataset and a private real-world dataset.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Nonlinear Causal Structure Learning for Mixed Data\",\"authors\":\"Wenjuan Wei, Lu Feng\",\"doi\":\"10.1109/ICDM51629.2021.00082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Causal discovery from observational data is a fundamental problem. A large number of algorithms have been proposed over the years for that purpose, but they usually handle the data of a single type, either continuous or discrete variables only. Recently, a few causal structure discovery algorithms have been developed for mixed data types, and received many applications. In this paper, we propose a structural equation model for mixed data types, which allows the causal mechanisms to be nonlinear and can consequently model many read-world situations. We prove that the causal structure is identifiable from the data distribution generated by the model under certain conditions. Moreover, we propose a maximum likelihood estimator and develop an efficient order search algorithm benefiting from a novel method of order space cutting, which can handle several hundred variables. We adopt automatic relevance determination kernel-based variable selection after order learning to recover the causal structure. Experiments on synthetic datasets demonstrate the accuracy and scalability of our approach. Especially, we apply our method to publicly available causal-effect pairs and show its superiority in the causal direction identification of mixed causal pairs. In addition, we show that our method can sensibly recover causal relationships on a publicly available real dataset and a private real-world dataset.\",\"PeriodicalId\":320970,\"journal\":{\"name\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM51629.2021.00082\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM51629.2021.00082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

从观测数据中发现因果关系是一个基本问题。为了这个目的，多年来已经提出了大量的算法，但它们通常只处理单一类型的数据，要么是连续的，要么是离散的。近年来，针对混合数据类型开发了一些因果结构发现算法，并得到了广泛的应用。在本文中，我们提出了一个混合数据类型的结构方程模型，该模型允许因果机制是非线性的，因此可以模拟许多读世界情况。在一定条件下，通过模型生成的数据分布证明了因果结构是可识别的。此外，我们提出了一个极大似然估计，并开发了一种有效的序搜索算法，该算法受益于一种新的序空间切割方法，可以处理数百个变量。我们采用顺序学习后基于自动关联确定核的变量选择来恢复因果结构。在合成数据集上的实验证明了该方法的准确性和可扩展性。特别地，我们将该方法应用于公开的因果对，并显示了其在混合因果对因果方向识别方面的优越性。此外，我们表明我们的方法可以在公开可用的真实数据集和私人真实数据集上合理地恢复因果关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Nonlinear Causal Structure Learning for Mixed Data

Causal discovery from observational data is a fundamental problem. A large number of algorithms have been proposed over the years for that purpose, but they usually handle the data of a single type, either continuous or discrete variables only. Recently, a few causal structure discovery algorithms have been developed for mixed data types, and received many applications. In this paper, we propose a structural equation model for mixed data types, which allows the causal mechanisms to be nonlinear and can consequently model many read-world situations. We prove that the causal structure is identifiable from the data distribution generated by the model under certain conditions. Moreover, we propose a maximum likelihood estimator and develop an efficient order search algorithm benefiting from a novel method of order space cutting, which can handle several hundred variables. We adopt automatic relevance determination kernel-based variable selection after order learning to recover the causal structure. Experiments on synthetic datasets demonstrate the accuracy and scalability of our approach. Especially, we apply our method to publicly available causal-effect pairs and show its superiority in the causal direction identification of mixed causal pairs. In addition, we show that our method can sensibly recover causal relationships on a publicly available real dataset and a private real-world dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Data Mining (ICDM)

自引率

0.00%

发文量