频繁因果模式挖掘:用于估计偏差校正效果的计算效率框架。

Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data Pub Date : 2019-12-01 Epub Date: 2020-02-24 DOI:10.1109/bigdata47090.2019.9005977

Pranjul Yadav, Pedro J Caraballo, Michael Steinbach, Vipin Kumar, M Regina Castro, Gyorgy Simon

{"title":"频繁因果模式挖掘:用于估计偏差校正效果的计算效率框架。","authors":"Pranjul Yadav, Pedro J Caraballo, Michael Steinbach, Vipin Kumar, M Regina Castro, Gyorgy Simon","doi":"10.1109/bigdata47090.2019.9005977","DOIUrl":null,"url":null,"abstract":"Our aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. Closed intervention sets also allow for a pruning strategy that is strictly more efficient than the traditional pruning strategy used by the Apriori algorithm. To implement our ideas, we introduce and compare five methods of estimating causal effect from observational data and rigorously evaluate them on synthetic data to mathematically prove (when possible) why they work. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of 152000 patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":" ","pages":"1981-1990"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bigdata47090.2019.9005977","citationCount":"2","resultStr":"{\"title\":\"Frequent Causal Pattern Mining: A Computationally Efficient Framework For Estimating Bias-Corrected Effects.\",\"authors\":\"Pranjul Yadav, Pedro J Caraballo, Michael Steinbach, Vipin Kumar, M Regina Castro, Gyorgy Simon\",\"doi\":\"10.1109/bigdata47090.2019.9005977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Our aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. Closed intervention sets also allow for a pruning strategy that is strictly more efficient than the traditional pruning strategy used by the Apriori algorithm. To implement our ideas, we introduce and compare five methods of estimating causal effect from observational data and rigorously evaluate them on synthetic data to mathematically prove (when possible) why they work. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of 152000 patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).\",\"PeriodicalId\":74501,\"journal\":{\"name\":\"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data\",\"volume\":\" \",\"pages\":\"1981-1990\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/bigdata47090.2019.9005977\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/bigdata47090.2019.9005977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2020/2/24 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bigdata47090.2019.9005977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/2/24 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们的老龄化人口越来越多地同时患有多种慢性疾病，需要对这些疾病进行综合治疗。为一组组合疾病寻找最优的药物组合是一个组合模式探索问题。关联规则挖掘是解决此类问题的一种流行工具，但是医疗保健需要寻找因果模式，而不是关联模式，这使得关联规则挖掘不适合。为了解决这个问题，我们提出了一个基于Rubin-Neyman因果模型的新框架，用于从观测数据中提取因果规则，纠正一些常见的偏差。具体来说，给定一组干预措施和一组定义亚群(例如，疾病)的项目，我们希望找到存在有效干预组合的所有亚群，并且在每个这样的亚群中，我们希望找到所有干预组合，以便从该组合中删除任何干预都会降低治疗效果。我们框架的一个关键方面是封闭干预集的概念，它将量化单一干预效果的概念扩展到一组并发干预。封闭干预集还允许一种比Apriori算法使用的传统修剪策略更有效的修剪策略。为了实现我们的想法，我们介绍并比较了从观测数据估计因果效应的五种方法，并在合成数据上严格评估它们，以在数学上证明(如果可能的话)它们为什么有效。我们还对来自梅奥诊所的152000名患者的电子健康记录(EHR)数据的因果规则挖掘框架进行了评估，并表明我们提取的模式足够丰富，可以解释医学文献中关于一类降胆固醇药物对2型糖尿病(T2DM)的影响的有争议的发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Frequent Causal Pattern Mining: A Computationally Efficient Framework For Estimating Bias-Corrected Effects.

Our aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. Closed intervention sets also allow for a pruning strategy that is strictly more efficient than the traditional pruning strategy used by the Apriori algorithm. To implement our ideas, we introduce and compare five methods of estimating causal effect from observational data and rigorously evaluate them on synthetic data to mathematically prove (when possible) why they work. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of 152000 patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data

自引率

0.00%

发文量