优化 VarLiNGAM 以实现可扩展的高效时间序列因果关系发现

arXiv - STAT - Computation Pub Date : 2024-09-09 DOI:arxiv-2409.05500

Ziyang Jiao, Ce Guo, Wayne Luk

{"title":"优化 VarLiNGAM 以实现可扩展的高效时间序列因果关系发现","authors":"Ziyang Jiao, Ce Guo, Wayne Luk","doi":"arxiv-2409.05500","DOIUrl":null,"url":null,"abstract":"Causal discovery is designed to identify causal relationships in data, a task\nthat has become increasingly complex due to the computational demands of\ntraditional methods such as VarLiNGAM, which combines Vector Autoregressive\nModel with Linear Non-Gaussian Acyclic Model for time series data. This study is dedicated to optimising causal discovery specifically for time\nseries data, which is common in practical applications. Time series causal\ndiscovery is particularly challenging due to the need to account for temporal\ndependencies and potential time lag effects. By designing a specialised dataset\ngenerator and reducing the computational complexity of the VarLiNGAM model from\n\\( O(m^3 \\cdot n) \\) to \\( O(m^3 + m^2 \\cdot n) \\), this study significantly\nimproves the feasibility of processing large datasets. The proposed methods\nhave been validated on advanced computational platforms and tested across\nsimulated, real-world, and large-scale datasets, showcasing enhanced efficiency\nand performance. The optimised algorithm achieved 7 to 13 times speedup\ncompared with the original algorithm and around 4.5 times speedup compared with\nthe GPU-accelerated version on large-scale datasets with feature sizes between\n200 and 400. Our methods aim to push the boundaries of current causal discovery\ncapabilities, making them more robust, scalable, and applicable to real-world\nscenarios, thus facilitating breakthroughs in various fields such as healthcare\nand finance.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"113 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing VarLiNGAM for Scalable and Efficient Time Series Causal Discovery\",\"authors\":\"Ziyang Jiao, Ce Guo, Wayne Luk\",\"doi\":\"arxiv-2409.05500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Causal discovery is designed to identify causal relationships in data, a task\\nthat has become increasingly complex due to the computational demands of\\ntraditional methods such as VarLiNGAM, which combines Vector Autoregressive\\nModel with Linear Non-Gaussian Acyclic Model for time series data. This study is dedicated to optimising causal discovery specifically for time\\nseries data, which is common in practical applications. Time series causal\\ndiscovery is particularly challenging due to the need to account for temporal\\ndependencies and potential time lag effects. By designing a specialised dataset\\ngenerator and reducing the computational complexity of the VarLiNGAM model from\\n\\\\( O(m^3 \\\\cdot n) \\\\) to \\\\( O(m^3 + m^2 \\\\cdot n) \\\\), this study significantly\\nimproves the feasibility of processing large datasets. The proposed methods\\nhave been validated on advanced computational platforms and tested across\\nsimulated, real-world, and large-scale datasets, showcasing enhanced efficiency\\nand performance. The optimised algorithm achieved 7 to 13 times speedup\\ncompared with the original algorithm and around 4.5 times speedup compared with\\nthe GPU-accelerated version on large-scale datasets with feature sizes between\\n200 and 400. Our methods aim to push the boundaries of current causal discovery\\ncapabilities, making them more robust, scalable, and applicable to real-world\\nscenarios, thus facilitating breakthroughs in various fields such as healthcare\\nand finance.\",\"PeriodicalId\":501215,\"journal\":{\"name\":\"arXiv - STAT - Computation\",\"volume\":\"113 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

因果发现的目的是识别数据中的因果关系，由于传统方法（如针对时间序列数据的矢量自回归模型与线性非高斯循环模型相结合的 VarLiNGAM）的计算需求，这项任务变得越来越复杂。本研究致力于优化时间序列数据的因果发现，这在实际应用中很常见。由于需要考虑时间依赖性和潜在的时滞效应，时间序列因果发现尤其具有挑战性。通过设计专门的数据集生成器，并将 VarLiNGAM 模型的计算复杂度从（ O(m^3 \cdot n) \）降低到（ O(m^3 + m^2 \cdot n) \），本研究大大提高了处理大型数据集的可行性。提出的方法在先进的计算平台上得到了验证，并在模拟、真实世界和大规模数据集上进行了测试，展示了更高的效率和性能。在特征大小介于 200 到 400 之间的大规模数据集上，优化算法的速度比原始算法提高了 7 到 13 倍，比 GPU 加速版本提高了约 4.5 倍。我们的方法旨在突破当前因果发现能力的界限，使其更加稳健、可扩展，并适用于现实世界的各种场景，从而促进医疗保健和金融等各个领域的突破。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimizing VarLiNGAM for Scalable and Efficient Time Series Causal Discovery

Causal discovery is designed to identify causal relationships in data, a task that has become increasingly complex due to the computational demands of traditional methods such as VarLiNGAM, which combines Vector Autoregressive Model with Linear Non-Gaussian Acyclic Model for time series data. This study is dedicated to optimising causal discovery specifically for time series data, which is common in practical applications. Time series causal discovery is particularly challenging due to the need to account for temporal dependencies and potential time lag effects. By designing a specialised dataset generator and reducing the computational complexity of the VarLiNGAM model from \( O(m^3 \cdot n) \) to \( O(m^3 + m^2 \cdot n) \), this study significantly improves the feasibility of processing large datasets. The proposed methods have been validated on advanced computational platforms and tested across simulated, real-world, and large-scale datasets, showcasing enhanced efficiency and performance. The optimised algorithm achieved 7 to 13 times speedup compared with the original algorithm and around 4.5 times speedup compared with the GPU-accelerated version on large-scale datasets with feature sizes between 200 and 400. Our methods aim to push the boundaries of current causal discovery capabilities, making them more robust, scalable, and applicable to real-world scenarios, thus facilitating breakthroughs in various fields such as healthcare and finance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - STAT - Computation

自引率

0.00%

发文量