Scalable Hashing-Based Network Discovery

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI:10.1109/ICDM.2017.50

Tara Safavi, C. Sripada, Danai Koutra

{"title":"Scalable Hashing-Based Network Discovery","authors":"Tara Safavi, C. Sripada, Danai Koutra","doi":"10.1109/ICDM.2017.50","DOIUrl":null,"url":null,"abstract":"Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. In these domains, networks are often constructed out of multiple time series by computing measures of association or similarity between pairs of series. The nodes in a discovered graph correspond to time series, which are linked via edges weighted by the association scores of their endpoints. After graph construction, the network may be thresholded such that only the edges with stronger weights remain and the desired sparsity level is achieved. While this approach is feasible for small datasets, its quadratic time complexity does not scale as the individual time series length and the number of compared series increase. Thus, to avoid the costly step of building a fully-connected graph before sparsification, we propose a fast network discovery approach based on probabilistic hashing of randomly selected time series subsequences. Evaluation on real data shows that our methods construct graphs nearly 15 times as fast as baseline methods, while achieving both network structure and accuracy comparable to baselines in task-based evaluation.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2017.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. In these domains, networks are often constructed out of multiple time series by computing measures of association or similarity between pairs of series. The nodes in a discovered graph correspond to time series, which are linked via edges weighted by the association scores of their endpoints. After graph construction, the network may be thresholded such that only the edges with stronger weights remain and the desired sparsity level is achieved. While this approach is feasible for small datasets, its quadratic time complexity does not scale as the individual time series length and the number of compared series increase. Thus, to avoid the costly step of building a fully-connected graph before sparsification, we propose a fast network discovery approach based on probabilistic hashing of randomly selected time series subsequences. Evaluation on real data shows that our methods construct graphs nearly 15 times as fast as baseline methods, while achieving both network structure and accuracy comparable to baselines in task-based evaluation.

查看原文本刊更多论文

可扩展的基于哈希的网络发现

从非网络数据中发现和分析网络是一项任务，在神经科学、基因组学、能源、经济学等领域都有应用。在这些领域中，网络通常由多个时间序列通过计算序列对之间的关联或相似性度量来构建。发现图中的节点对应于时间序列，这些时间序列通过端点关联分数加权的边连接起来。在图构建之后，可以对网络设置阈值，这样只保留权重更大的边，从而达到所需的稀疏度水平。虽然这种方法对于小数据集是可行的，但其二次时间复杂度不随单个时间序列长度和比较序列数量的增加而增加。因此，为了避免在稀疏化之前构建全连接图的昂贵步骤，我们提出了一种基于随机选择的时间序列子序列的概率哈希的快速网络发现方法。对真实数据的评估表明，我们的方法构建图的速度是基线方法的近15倍，同时在基于任务的评估中实现了与基线相当的网络结构和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Data Mining (ICDM)

自引率

0.00%

发文量