Accurate and Fast Estimation of Temporal Motifs using Path Sampling

arXiv - CS - Social and Information Networks Pub Date : 2024-09-13 DOI:arxiv-2409.08975

Yunjie Pan, Omkar Bhalerao, C. Seshadhri, Nishil Talati

{"title":"Accurate and Fast Estimation of Temporal Motifs using Path Sampling","authors":"Yunjie Pan, Omkar Bhalerao, C. Seshadhri, Nishil Talati","doi":"arxiv-2409.08975","DOIUrl":null,"url":null,"abstract":"Counting the number of small subgraphs, called motifs, is a fundamental\nproblem in social network analysis and graph mining. Many real-world networks\nare directed and temporal, where edges have timestamps. Motif counting in\ndirected, temporal graphs is especially challenging because there are a\nplethora of different kinds of patterns. Temporal motif counts reveal much\nricher information and there is a need for scalable algorithms for motif\ncounting. A major challenge in counting is that there can be trillions of temporal\nmotif matches even with a graph with only millions of vertices. Both the motifs\nand the input graphs can have multiple edges between two vertices, leading to a\ncombinatorial explosion problem. Counting temporal motifs involving just four\nvertices is not feasible with current state-of-the-art algorithms. We design an algorithm, TEACUPS, that addresses this problem using a novel\ntechnique of temporal path sampling. We combine a path sampling method with\ncarefully designed temporal data structures, to propose an efficient\napproximate algorithm for temporal motif counting. TEACUPS is an unbiased\nestimator with provable concentration behavior, which can be used to bound the\nestimation error. For a Bitcoin graph with hundreds of millions of edges,\nTEACUPS runs in less than 1 minute, while the exact counting algorithm takes\nmore than a day. We empirically demonstrate the accuracy of TEACUPS on large\ndatasets, showing an average of 30$\\times$ speedup (up to 2000$\\times$ speedup)\ncompared to existing GPU-based exact counting methods while preserving high\ncount estimation accuracy.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08975","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Counting the number of small subgraphs, called motifs, is a fundamental problem in social network analysis and graph mining. Many real-world networks are directed and temporal, where edges have timestamps. Motif counting in directed, temporal graphs is especially challenging because there are a plethora of different kinds of patterns. Temporal motif counts reveal much richer information and there is a need for scalable algorithms for motif counting. A major challenge in counting is that there can be trillions of temporal motif matches even with a graph with only millions of vertices. Both the motifs and the input graphs can have multiple edges between two vertices, leading to a combinatorial explosion problem. Counting temporal motifs involving just four vertices is not feasible with current state-of-the-art algorithms. We design an algorithm, TEACUPS, that addresses this problem using a novel technique of temporal path sampling. We combine a path sampling method with carefully designed temporal data structures, to propose an efficient approximate algorithm for temporal motif counting. TEACUPS is an unbiased estimator with provable concentration behavior, which can be used to bound the estimation error. For a Bitcoin graph with hundreds of millions of edges, TEACUPS runs in less than 1 minute, while the exact counting algorithm takes more than a day. We empirically demonstrate the accuracy of TEACUPS on large datasets, showing an average of 30$\times$ speedup (up to 2000$\times$ speedup) compared to existing GPU-based exact counting methods while preserving high count estimation accuracy.

查看原文本刊更多论文

利用路径采样准确快速地估计时空动机

计算小型子图（称为主题图）的数量是社交网络分析和图挖掘中的一个基本问题。现实世界中的许多网络都是有向和时态的，其边缘都有时间戳。由于存在大量不同类型的模式，因此对间接的时间图进行图案计数尤其具有挑战性。时态图案计数能揭示更丰富的信息，因此需要可扩展的图案计数算法。计数的一大挑战在于，即使只有数百万顶点的图，也可能有数万亿个时态图案匹配。主题图和输入图的两个顶点之间都可能有多条边，从而导致组合爆炸问题。目前最先进的算法无法计算只涉及四个顶点的时空主题。我们设计了一种名为 TEACUPS 的算法，利用新颖的时空路径采样技术来解决这个问题。我们将路径采样方法与精心设计的时态数据结构相结合，提出了一种高效的近似时态图案计数算法。TEACUPS 是一种无偏估计器，具有可证明的集中行为，可用于限制估计误差。对于具有数亿条边的比特币图，TEACUPS 的运行时间不到 1 分钟，而精确计数算法则需要一天以上。我们通过实证证明了TEACUPS在大型数据集上的准确性，与现有的基于GPU的精确计数方法相比，TEACUPS平均提速30倍（最高提速2000倍），同时保持了较高的计数估计准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Social and Information Networks

自引率

0.00%

发文量