{"title":"Causal Discovery of Medical Test Parameters Based on Improved PC Algorithm","authors":"Xueyao Qiu, Fangqing Gu, Yiqun Zhang","doi":"10.1109/DOCS55193.2022.9967738","DOIUrl":null,"url":null,"abstract":"Causal discovery from observational data is extremely challenging, especially in obtaining precise causal relationships in observational data. Existing methods for such issue can be roughly categorized into Constrained-based and Score-based causal discovery methods. A common independence test in PC algorithms is Fisher’s exact test, which can only cope with the complete data set. However, missing data is common in many application domains including the healthcare data analysis. When processing data set with missing values, the independence of observed data may differ from that of the corresponding full data generated by the underlying causal processes, and thus unsatisfactory results may occur if we simply applied the Fisher’s exact test-based PC causal discovery method to observational data. Medical test parameters are often used to reflect the patient’s physical condition, and mastering the causal relationship between medical test parameters can manage patients more efficiently. However, in most cases, medical test parameters have missing values. This paper, consequently, proposes an algorithm to first perform a testwise-deletion Fisher-z independence test to data sets with missing values, fill in missing data by generating virtual data to perform the CI relations test, and then use the rule of resolving conflicts between unshielded colliders confirmed as orient bi-directed. Finally, the K2 and Bayesian-Dirichlet equivalent uniform (BDeu) scoring functions were used to score the causal structure discovered by the PC algorithm and the causal structure found by the PC algorithm based on the Missing-value Fisher-z test with orient bi-directed, respectively. Experimental results demonstrate that the causal structure discovered by the proposed algorithm yields a more precise casual analysis.","PeriodicalId":348545,"journal":{"name":"2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DOCS55193.2022.9967738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Causal discovery from observational data is extremely challenging, especially in obtaining precise causal relationships in observational data. Existing methods for such issue can be roughly categorized into Constrained-based and Score-based causal discovery methods. A common independence test in PC algorithms is Fisher’s exact test, which can only cope with the complete data set. However, missing data is common in many application domains including the healthcare data analysis. When processing data set with missing values, the independence of observed data may differ from that of the corresponding full data generated by the underlying causal processes, and thus unsatisfactory results may occur if we simply applied the Fisher’s exact test-based PC causal discovery method to observational data. Medical test parameters are often used to reflect the patient’s physical condition, and mastering the causal relationship between medical test parameters can manage patients more efficiently. However, in most cases, medical test parameters have missing values. This paper, consequently, proposes an algorithm to first perform a testwise-deletion Fisher-z independence test to data sets with missing values, fill in missing data by generating virtual data to perform the CI relations test, and then use the rule of resolving conflicts between unshielded colliders confirmed as orient bi-directed. Finally, the K2 and Bayesian-Dirichlet equivalent uniform (BDeu) scoring functions were used to score the causal structure discovered by the PC algorithm and the causal structure found by the PC algorithm based on the Missing-value Fisher-z test with orient bi-directed, respectively. Experimental results demonstrate that the causal structure discovered by the proposed algorithm yields a more precise casual analysis.