{"title":"Scalable Hashing-Based Network Discovery","authors":"Tara Safavi, C. Sripada, Danai Koutra","doi":"10.1109/ICDM.2017.50","DOIUrl":null,"url":null,"abstract":"Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. In these domains, networks are often constructed out of multiple time series by computing measures of association or similarity between pairs of series. The nodes in a discovered graph correspond to time series, which are linked via edges weighted by the association scores of their endpoints. After graph construction, the network may be thresholded such that only the edges with stronger weights remain and the desired sparsity level is achieved. While this approach is feasible for small datasets, its quadratic time complexity does not scale as the individual time series length and the number of compared series increase. Thus, to avoid the costly step of building a fully-connected graph before sparsification, we propose a fast network discovery approach based on probabilistic hashing of randomly selected time series subsequences. Evaluation on real data shows that our methods construct graphs nearly 15 times as fast as baseline methods, while achieving both network structure and accuracy comparable to baselines in task-based evaluation.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2017.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. In these domains, networks are often constructed out of multiple time series by computing measures of association or similarity between pairs of series. The nodes in a discovered graph correspond to time series, which are linked via edges weighted by the association scores of their endpoints. After graph construction, the network may be thresholded such that only the edges with stronger weights remain and the desired sparsity level is achieved. While this approach is feasible for small datasets, its quadratic time complexity does not scale as the individual time series length and the number of compared series increase. Thus, to avoid the costly step of building a fully-connected graph before sparsification, we propose a fast network discovery approach based on probabilistic hashing of randomly selected time series subsequences. Evaluation on real data shows that our methods construct graphs nearly 15 times as fast as baseline methods, while achieving both network structure and accuracy comparable to baselines in task-based evaluation.