Efficiently Sampling and Estimating Hypergraphs By Hybrid Random Walk

Lingling Zhang, Zhiwei Zhang, Guoren Wang, Ye Yuan
{"title":"Efficiently Sampling and Estimating Hypergraphs By Hybrid Random Walk","authors":"Lingling Zhang, Zhiwei Zhang, Guoren Wang, Ye Yuan","doi":"10.1109/ICDE55515.2023.00102","DOIUrl":null,"url":null,"abstract":"Hypergraphs provide a powerful tool for representing group interactions in complicated networks. Analyzing statical properties of hypergraphs by sampling is an increasing fundamental research problem in the field of data processing. However, the state-of-the-art sampling methods either focus on pairwise graphs or are insensitive to the structures formed by vertices and hyperedges, resulting in estimations with low accuracy and efficiency. To efficiently characterize the properties of both vertices and hyperedges, this paper first proposes a hybrid random walk based Markov Chain Monte Carlo (MCMC) model theoretically by carefully designing its mixture states and the transition matrix. For simplifying the implementation of this model, we develop an algorithm formed by vertex and hyperedge transitions saving costs for constructing mixture states in practice along with an estimating method for accurate estimations. Furthermore, we employ a non-backtracking strategy in the vertex transitions to accelerate the convergence of the hybrid random walk and propose to skip the sampled vertices in the hyperedge transitions to avoid being trapped in the local subgraph for improving accuracy and reducing query cost. Extensive experimental results on the real-world datasets confirm the higher accuracy and efficiency of our proposed methods than the sophisticated sampling methods.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE55515.2023.00102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Hypergraphs provide a powerful tool for representing group interactions in complicated networks. Analyzing statical properties of hypergraphs by sampling is an increasing fundamental research problem in the field of data processing. However, the state-of-the-art sampling methods either focus on pairwise graphs or are insensitive to the structures formed by vertices and hyperedges, resulting in estimations with low accuracy and efficiency. To efficiently characterize the properties of both vertices and hyperedges, this paper first proposes a hybrid random walk based Markov Chain Monte Carlo (MCMC) model theoretically by carefully designing its mixture states and the transition matrix. For simplifying the implementation of this model, we develop an algorithm formed by vertex and hyperedge transitions saving costs for constructing mixture states in practice along with an estimating method for accurate estimations. Furthermore, we employ a non-backtracking strategy in the vertex transitions to accelerate the convergence of the hybrid random walk and propose to skip the sampled vertices in the hyperedge transitions to avoid being trapped in the local subgraph for improving accuracy and reducing query cost. Extensive experimental results on the real-world datasets confirm the higher accuracy and efficiency of our proposed methods than the sophisticated sampling methods.
基于混合随机漫步的超图高效采样与估计
超图为表示复杂网络中的群体交互提供了一个强大的工具。利用抽样方法分析超图的静态性质是数据处理领域一个日益重要的基础性研究问题。然而,最先进的采样方法要么集中在成对图上,要么对顶点和超边形成的结构不敏感,导致估计精度和效率较低。为了有效地表征顶点和超边的性质,本文首先从理论上提出了一种基于混合随机游走的马尔可夫链蒙特卡罗(MCMC)模型,并对其混合状态和转移矩阵进行了精心设计。为了简化该模型的实现,我们开发了一种由顶点和超边缘转换形成的算法,节省了实践中构建混合状态的成本,并提供了一种精确估计的估计方法。此外,我们在顶点转换中采用非回溯策略来加速混合随机行走的收敛,并提出在超边缘转换中跳过采样顶点以避免被困在局部子图中,以提高准确性并降低查询成本。在实际数据集上的大量实验结果证实了我们提出的方法比复杂的采样方法具有更高的精度和效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信