Scaling of Union of Intersections for Inference of Granger Causal Networks from Observational Data

M. Balasubramanian, Trevor D. Ruiz, B. Cook, Prabhat, Sharmodeep Bhattacharyya, Aviral Shrivastava, K. Bouchard
{"title":"Scaling of Union of Intersections for Inference of Granger Causal Networks from Observational Data","authors":"M. Balasubramanian, Trevor D. Ruiz, B. Cook, Prabhat, Sharmodeep Bhattacharyya, Aviral Shrivastava, K. Bouchard","doi":"10.1109/IPDPS47924.2020.00036","DOIUrl":null,"url":null,"abstract":"The development of advanced recording and measurement devices in scientific fields is producing high-dimensional time series data. Vector autoregressive (VAR) models are well suited for inferring Granger-causal networks from high dimensional time series data sets, but accurate inference at scale remains a central challenge. We have recently introduced a flexible and scalable statistical machine learning framework, Union of Intersections (UoI), which enables low false-positive and low false-negative feature selection along with low bias and low variance estimation, enhancing interpretation and predictive accuracy. In this paper, we scale the UoI framework for VAR models (algorithm UoIV AR) to infer network connectivity from large time series data sets (TBs). To achieve this, we optimize distributed convex optimization and introduce novel strategies for improved data read and data distribution times. We study the strong and weak scaling of the algorithm on a Xeon-phi based supercomputer (100,000 cores). These advances enable us to estimate the largest VAR model as known (1000 nodes, corresponding to 1M parameters) and apply it to large time series data from neurophysiology (192 neurons) and finance (470 companies).","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"56 1","pages":"264-273"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The development of advanced recording and measurement devices in scientific fields is producing high-dimensional time series data. Vector autoregressive (VAR) models are well suited for inferring Granger-causal networks from high dimensional time series data sets, but accurate inference at scale remains a central challenge. We have recently introduced a flexible and scalable statistical machine learning framework, Union of Intersections (UoI), which enables low false-positive and low false-negative feature selection along with low bias and low variance estimation, enhancing interpretation and predictive accuracy. In this paper, we scale the UoI framework for VAR models (algorithm UoIV AR) to infer network connectivity from large time series data sets (TBs). To achieve this, we optimize distributed convex optimization and introduce novel strategies for improved data read and data distribution times. We study the strong and weak scaling of the algorithm on a Xeon-phi based supercomputer (100,000 cores). These advances enable us to estimate the largest VAR model as known (1000 nodes, corresponding to 1M parameters) and apply it to large time series data from neurophysiology (192 neurons) and finance (470 companies).
从观测数据推断格兰杰因果网络的交点并的标度
科学领域先进记录和测量设备的发展正在产生高维时间序列数据。向量自回归(VAR)模型非常适合从高维时间序列数据集推断格兰杰因果网络,但大规模的准确推断仍然是一个核心挑战。我们最近推出了一个灵活且可扩展的统计机器学习框架,交集联合(UoI),它可以实现低假阳性和低假阴性特征选择以及低偏差和低方差估计,从而提高解释和预测准确性。在本文中,我们扩展了VAR模型(算法UoIV AR)的ui框架,以从大型时间序列数据集(tb)中推断网络连通性。为了实现这一点,我们优化了分布式凸优化,并引入了改进数据读取和数据分发时间的新策略。我们在基于Xeon-phi的超级计算机(100,000核)上研究了该算法的强缩放和弱缩放。这些进步使我们能够估计已知的最大VAR模型(1000个节点,对应1M个参数),并将其应用于来自神经生理学(192个神经元)和金融(470家公司)的大型时间序列数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信