{"title":"Optimizing VarLiNGAM for Scalable and Efficient Time Series Causal Discovery","authors":"Ziyang Jiao, Ce Guo, Wayne Luk","doi":"arxiv-2409.05500","DOIUrl":null,"url":null,"abstract":"Causal discovery is designed to identify causal relationships in data, a task\nthat has become increasingly complex due to the computational demands of\ntraditional methods such as VarLiNGAM, which combines Vector Autoregressive\nModel with Linear Non-Gaussian Acyclic Model for time series data. This study is dedicated to optimising causal discovery specifically for time\nseries data, which is common in practical applications. Time series causal\ndiscovery is particularly challenging due to the need to account for temporal\ndependencies and potential time lag effects. By designing a specialised dataset\ngenerator and reducing the computational complexity of the VarLiNGAM model from\n\\( O(m^3 \\cdot n) \\) to \\( O(m^3 + m^2 \\cdot n) \\), this study significantly\nimproves the feasibility of processing large datasets. The proposed methods\nhave been validated on advanced computational platforms and tested across\nsimulated, real-world, and large-scale datasets, showcasing enhanced efficiency\nand performance. The optimised algorithm achieved 7 to 13 times speedup\ncompared with the original algorithm and around 4.5 times speedup compared with\nthe GPU-accelerated version on large-scale datasets with feature sizes between\n200 and 400. Our methods aim to push the boundaries of current causal discovery\ncapabilities, making them more robust, scalable, and applicable to real-world\nscenarios, thus facilitating breakthroughs in various fields such as healthcare\nand finance.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"113 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Causal discovery is designed to identify causal relationships in data, a task
that has become increasingly complex due to the computational demands of
traditional methods such as VarLiNGAM, which combines Vector Autoregressive
Model with Linear Non-Gaussian Acyclic Model for time series data. This study is dedicated to optimising causal discovery specifically for time
series data, which is common in practical applications. Time series causal
discovery is particularly challenging due to the need to account for temporal
dependencies and potential time lag effects. By designing a specialised dataset
generator and reducing the computational complexity of the VarLiNGAM model from
\( O(m^3 \cdot n) \) to \( O(m^3 + m^2 \cdot n) \), this study significantly
improves the feasibility of processing large datasets. The proposed methods
have been validated on advanced computational platforms and tested across
simulated, real-world, and large-scale datasets, showcasing enhanced efficiency
and performance. The optimised algorithm achieved 7 to 13 times speedup
compared with the original algorithm and around 4.5 times speedup compared with
the GPU-accelerated version on large-scale datasets with feature sizes between
200 and 400. Our methods aim to push the boundaries of current causal discovery
capabilities, making them more robust, scalable, and applicable to real-world
scenarios, thus facilitating breakthroughs in various fields such as healthcare
and finance.