Scaling of Union of Intersections for Inference of Granger Causal Networks from Observational Data

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI:10.1109/IPDPS47924.2020.00036

M. Balasubramanian, Trevor D. Ruiz, B. Cook, Prabhat, Sharmodeep Bhattacharyya, Aviral Shrivastava, K. Bouchard

{"title":"Scaling of Union of Intersections for Inference of Granger Causal Networks from Observational Data","authors":"M. Balasubramanian, Trevor D. Ruiz, B. Cook, Prabhat, Sharmodeep Bhattacharyya, Aviral Shrivastava, K. Bouchard","doi":"10.1109/IPDPS47924.2020.00036","DOIUrl":null,"url":null,"abstract":"The development of advanced recording and measurement devices in scientific fields is producing high-dimensional time series data. Vector autoregressive (VAR) models are well suited for inferring Granger-causal networks from high dimensional time series data sets, but accurate inference at scale remains a central challenge. We have recently introduced a flexible and scalable statistical machine learning framework, Union of Intersections (UoI), which enables low false-positive and low false-negative feature selection along with low bias and low variance estimation, enhancing interpretation and predictive accuracy. In this paper, we scale the UoI framework for VAR models (algorithm UoIV AR) to infer network connectivity from large time series data sets (TBs). To achieve this, we optimize distributed convex optimization and introduce novel strategies for improved data read and data distribution times. We study the strong and weak scaling of the algorithm on a Xeon-phi based supercomputer (100,000 cores). These advances enable us to estimate the largest VAR model as known (1000 nodes, corresponding to 1M parameters) and apply it to large time series data from neurophysiology (192 neurons) and finance (470 companies).","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"56 1","pages":"264-273"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The development of advanced recording and measurement devices in scientific fields is producing high-dimensional time series data. Vector autoregressive (VAR) models are well suited for inferring Granger-causal networks from high dimensional time series data sets, but accurate inference at scale remains a central challenge. We have recently introduced a flexible and scalable statistical machine learning framework, Union of Intersections (UoI), which enables low false-positive and low false-negative feature selection along with low bias and low variance estimation, enhancing interpretation and predictive accuracy. In this paper, we scale the UoI framework for VAR models (algorithm UoIV AR) to infer network connectivity from large time series data sets (TBs). To achieve this, we optimize distributed convex optimization and introduce novel strategies for improved data read and data distribution times. We study the strong and weak scaling of the algorithm on a Xeon-phi based supercomputer (100,000 cores). These advances enable us to estimate the largest VAR model as known (1000 nodes, corresponding to 1M parameters) and apply it to large time series data from neurophysiology (192 neurons) and finance (470 companies).

查看原文本刊更多论文

从观测数据推断格兰杰因果网络的交点并的标度

科学领域先进记录和测量设备的发展正在产生高维时间序列数据。向量自回归(VAR)模型非常适合从高维时间序列数据集推断格兰杰因果网络，但大规模的准确推断仍然是一个核心挑战。我们最近推出了一个灵活且可扩展的统计机器学习框架，交集联合(UoI)，它可以实现低假阳性和低假阴性特征选择以及低偏差和低方差估计，从而提高解释和预测准确性。在本文中，我们扩展了VAR模型(算法UoIV AR)的ui框架，以从大型时间序列数据集(tb)中推断网络连通性。为了实现这一点，我们优化了分布式凸优化，并引入了改进数据读取和数据分发时间的新策略。我们在基于Xeon-phi的超级计算机(100,000核)上研究了该算法的强缩放和弱缩放。这些进步使我们能够估计已知的最大VAR模型(1000个节点，对应1M个参数)，并将其应用于来自神经生理学(192个神经元)和金融(470家公司)的大型时间序列数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量