Lingling Zhang, Zhiwei Zhang, Guoren Wang, Ye Yuan
{"title":"Efficiently Sampling and Estimating Hypergraphs By Hybrid Random Walk","authors":"Lingling Zhang, Zhiwei Zhang, Guoren Wang, Ye Yuan","doi":"10.1109/ICDE55515.2023.00102","DOIUrl":null,"url":null,"abstract":"Hypergraphs provide a powerful tool for representing group interactions in complicated networks. Analyzing statical properties of hypergraphs by sampling is an increasing fundamental research problem in the field of data processing. However, the state-of-the-art sampling methods either focus on pairwise graphs or are insensitive to the structures formed by vertices and hyperedges, resulting in estimations with low accuracy and efficiency. To efficiently characterize the properties of both vertices and hyperedges, this paper first proposes a hybrid random walk based Markov Chain Monte Carlo (MCMC) model theoretically by carefully designing its mixture states and the transition matrix. For simplifying the implementation of this model, we develop an algorithm formed by vertex and hyperedge transitions saving costs for constructing mixture states in practice along with an estimating method for accurate estimations. Furthermore, we employ a non-backtracking strategy in the vertex transitions to accelerate the convergence of the hybrid random walk and propose to skip the sampled vertices in the hyperedge transitions to avoid being trapped in the local subgraph for improving accuracy and reducing query cost. Extensive experimental results on the real-world datasets confirm the higher accuracy and efficiency of our proposed methods than the sophisticated sampling methods.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE55515.2023.00102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Hypergraphs provide a powerful tool for representing group interactions in complicated networks. Analyzing statical properties of hypergraphs by sampling is an increasing fundamental research problem in the field of data processing. However, the state-of-the-art sampling methods either focus on pairwise graphs or are insensitive to the structures formed by vertices and hyperedges, resulting in estimations with low accuracy and efficiency. To efficiently characterize the properties of both vertices and hyperedges, this paper first proposes a hybrid random walk based Markov Chain Monte Carlo (MCMC) model theoretically by carefully designing its mixture states and the transition matrix. For simplifying the implementation of this model, we develop an algorithm formed by vertex and hyperedge transitions saving costs for constructing mixture states in practice along with an estimating method for accurate estimations. Furthermore, we employ a non-backtracking strategy in the vertex transitions to accelerate the convergence of the hybrid random walk and propose to skip the sampled vertices in the hyperedge transitions to avoid being trapped in the local subgraph for improving accuracy and reducing query cost. Extensive experimental results on the real-world datasets confirm the higher accuracy and efficiency of our proposed methods than the sophisticated sampling methods.