Yunjie Pan, Omkar Bhalerao, C. Seshadhri, Nishil Talati
{"title":"Accurate and Fast Estimation of Temporal Motifs using Path Sampling","authors":"Yunjie Pan, Omkar Bhalerao, C. Seshadhri, Nishil Talati","doi":"arxiv-2409.08975","DOIUrl":null,"url":null,"abstract":"Counting the number of small subgraphs, called motifs, is a fundamental\nproblem in social network analysis and graph mining. Many real-world networks\nare directed and temporal, where edges have timestamps. Motif counting in\ndirected, temporal graphs is especially challenging because there are a\nplethora of different kinds of patterns. Temporal motif counts reveal much\nricher information and there is a need for scalable algorithms for motif\ncounting. A major challenge in counting is that there can be trillions of temporal\nmotif matches even with a graph with only millions of vertices. Both the motifs\nand the input graphs can have multiple edges between two vertices, leading to a\ncombinatorial explosion problem. Counting temporal motifs involving just four\nvertices is not feasible with current state-of-the-art algorithms. We design an algorithm, TEACUPS, that addresses this problem using a novel\ntechnique of temporal path sampling. We combine a path sampling method with\ncarefully designed temporal data structures, to propose an efficient\napproximate algorithm for temporal motif counting. TEACUPS is an unbiased\nestimator with provable concentration behavior, which can be used to bound the\nestimation error. For a Bitcoin graph with hundreds of millions of edges,\nTEACUPS runs in less than 1 minute, while the exact counting algorithm takes\nmore than a day. We empirically demonstrate the accuracy of TEACUPS on large\ndatasets, showing an average of 30$\\times$ speedup (up to 2000$\\times$ speedup)\ncompared to existing GPU-based exact counting methods while preserving high\ncount estimation accuracy.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08975","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Counting the number of small subgraphs, called motifs, is a fundamental
problem in social network analysis and graph mining. Many real-world networks
are directed and temporal, where edges have timestamps. Motif counting in
directed, temporal graphs is especially challenging because there are a
plethora of different kinds of patterns. Temporal motif counts reveal much
richer information and there is a need for scalable algorithms for motif
counting. A major challenge in counting is that there can be trillions of temporal
motif matches even with a graph with only millions of vertices. Both the motifs
and the input graphs can have multiple edges between two vertices, leading to a
combinatorial explosion problem. Counting temporal motifs involving just four
vertices is not feasible with current state-of-the-art algorithms. We design an algorithm, TEACUPS, that addresses this problem using a novel
technique of temporal path sampling. We combine a path sampling method with
carefully designed temporal data structures, to propose an efficient
approximate algorithm for temporal motif counting. TEACUPS is an unbiased
estimator with provable concentration behavior, which can be used to bound the
estimation error. For a Bitcoin graph with hundreds of millions of edges,
TEACUPS runs in less than 1 minute, while the exact counting algorithm takes
more than a day. We empirically demonstrate the accuracy of TEACUPS on large
datasets, showing an average of 30$\times$ speedup (up to 2000$\times$ speedup)
compared to existing GPU-based exact counting methods while preserving high
count estimation accuracy.