Hypergraph motifs and their extensions beyond binary

Geon Lee, Seokbum Yoon, Jihoon Ko, Hyunju Kim, Kijung Shin
{"title":"Hypergraph motifs and their extensions beyond binary","authors":"Geon Lee, Seokbum Yoon, Jihoon Ko, Hyunju Kim, Kijung Shin","doi":"10.1007/s00778-023-00827-8","DOIUrl":null,"url":null,"abstract":"<p>Hypergraphs naturally represent group interactions, which are omnipresent in many domains: collaborations of researchers, co-purchases of items, and joint interactions of proteins, to name a few. In this work, we propose tools for answering the following questions in a systematic manner: (Q1) what are the structural design principles of real-world hypergraphs? (Q2) how can we compare local structures of hypergraphs of different sizes? (Q3) how can we identify domains from which hypergraphs are? We first define <i>hypergraph motifs</i> (h-motifs), which describe the overlapping patterns of three connected hyperedges. Then, we define the significance of each h-motif in a hypergraph as its occurrences relative to those in properly randomized hypergraphs. Lastly, we define the <i>characteristic profile</i> (CP) as the vector of the normalized significance of every h-motif. Regarding Q1, we find that h-motifs ’ occurrences in 11 real-world hypergraphs from 5 domains are clearly distinguished from those of randomized hypergraphs. In addition, we demonstrate that CPs capture local structural patterns unique to each domain, thus comparing CPs of hypergraphs addresses Q2 and Q3. The concept of CP is naturally extended to represent the connectivity pattern of each node or hyperedge as a vector, which proves useful in node classification and hyperedge prediction. Our algorithmic contribution is to propose <span>MoCHy</span>, a family of parallel algorithms for counting h-motifs ’ occurrences in a hypergraph. We theoretically analyze their speed and accuracy and show empirically that the advanced approximate version <span>MoCHy-A</span><span>\\(^{+}\\)</span> is up to <span>\\(25\\times \\)</span> more accurate and <span>\\(32\\times \\)</span> faster than the basic approximate and exact versions, respectively. Furthermore, we explore <i>ternary hypergraph motifs</i> that extends h-motifs by taking into account not only the presence but also the cardinality of intersections among hyperedges. This extension proves beneficial for all previously mentioned applications.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-023-00827-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Hypergraphs naturally represent group interactions, which are omnipresent in many domains: collaborations of researchers, co-purchases of items, and joint interactions of proteins, to name a few. In this work, we propose tools for answering the following questions in a systematic manner: (Q1) what are the structural design principles of real-world hypergraphs? (Q2) how can we compare local structures of hypergraphs of different sizes? (Q3) how can we identify domains from which hypergraphs are? We first define hypergraph motifs (h-motifs), which describe the overlapping patterns of three connected hyperedges. Then, we define the significance of each h-motif in a hypergraph as its occurrences relative to those in properly randomized hypergraphs. Lastly, we define the characteristic profile (CP) as the vector of the normalized significance of every h-motif. Regarding Q1, we find that h-motifs ’ occurrences in 11 real-world hypergraphs from 5 domains are clearly distinguished from those of randomized hypergraphs. In addition, we demonstrate that CPs capture local structural patterns unique to each domain, thus comparing CPs of hypergraphs addresses Q2 and Q3. The concept of CP is naturally extended to represent the connectivity pattern of each node or hyperedge as a vector, which proves useful in node classification and hyperedge prediction. Our algorithmic contribution is to propose MoCHy, a family of parallel algorithms for counting h-motifs ’ occurrences in a hypergraph. We theoretically analyze their speed and accuracy and show empirically that the advanced approximate version MoCHy-A\(^{+}\) is up to \(25\times \) more accurate and \(32\times \) faster than the basic approximate and exact versions, respectively. Furthermore, we explore ternary hypergraph motifs that extends h-motifs by taking into account not only the presence but also the cardinality of intersections among hyperedges. This extension proves beneficial for all previously mentioned applications.

Abstract Image

超图主题及其二进制以外的扩展
超图天然地代表了群体互动,这种互动在许多领域无处不在:如研究人员的合作、物品的共同购买以及蛋白质的联合互动等等。在这项工作中,我们提出了用于系统回答以下问题的工具:(问题 1)现实世界超图的结构设计原则是什么? 问题 2)如何比较不同大小超图的局部结构?(Q3) 如何识别超图所在的域?我们首先定义了超图主题(h-motifs),它描述了三个相连超边的重叠模式。然后,我们将超图中每个 h-motif 的重要性定义为相对于正确随机化超图中出现的次数。最后,我们将特征轮廓(CP)定义为每个 h-motif的归一化意义向量。关于问题 1,我们发现来自 5 个领域的 11 个真实超图中出现的 h-motifs与随机超图中出现的 h-motifs有明显区别。此外,我们还证明了 CP 可捕捉每个领域特有的局部结构模式,从而比较了 Q2 和 Q3 地址超图的 CP。CP 的概念可以自然地扩展到以向量的形式表示每个节点或超边的连接模式,这在节点分类和超边预测中非常有用。我们在算法上的贡献在于提出了 MoCHy,这是一系列并行算法,用于计算超图中出现的 h-motifs。我们从理论上分析了它们的速度和准确性,并通过实证表明,高级近似版本 MoCHy-A\(^{+}\) 比基本近似版本和精确版本分别准确了 25 倍和快了 32 倍。此外,我们还探索了三元超图图案,它不仅考虑到了超边的存在,而且还考虑到了超边之间交集的万有性,从而扩展了 h-图案。事实证明,这种扩展对前面提到的所有应用都是有益的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信