Geon Lee, Seokbum Yoon, Jihoon Ko, Hyunju Kim, Kijung Shin
{"title":"超图主题及其二进制以外的扩展","authors":"Geon Lee, Seokbum Yoon, Jihoon Ko, Hyunju Kim, Kijung Shin","doi":"10.1007/s00778-023-00827-8","DOIUrl":null,"url":null,"abstract":"<p>Hypergraphs naturally represent group interactions, which are omnipresent in many domains: collaborations of researchers, co-purchases of items, and joint interactions of proteins, to name a few. In this work, we propose tools for answering the following questions in a systematic manner: (Q1) what are the structural design principles of real-world hypergraphs? (Q2) how can we compare local structures of hypergraphs of different sizes? (Q3) how can we identify domains from which hypergraphs are? We first define <i>hypergraph motifs</i> (h-motifs), which describe the overlapping patterns of three connected hyperedges. Then, we define the significance of each h-motif in a hypergraph as its occurrences relative to those in properly randomized hypergraphs. Lastly, we define the <i>characteristic profile</i> (CP) as the vector of the normalized significance of every h-motif. Regarding Q1, we find that h-motifs ’ occurrences in 11 real-world hypergraphs from 5 domains are clearly distinguished from those of randomized hypergraphs. In addition, we demonstrate that CPs capture local structural patterns unique to each domain, thus comparing CPs of hypergraphs addresses Q2 and Q3. The concept of CP is naturally extended to represent the connectivity pattern of each node or hyperedge as a vector, which proves useful in node classification and hyperedge prediction. Our algorithmic contribution is to propose <span>MoCHy</span>, a family of parallel algorithms for counting h-motifs ’ occurrences in a hypergraph. We theoretically analyze their speed and accuracy and show empirically that the advanced approximate version <span>MoCHy-A</span><span>\\(^{+}\\)</span> is up to <span>\\(25\\times \\)</span> more accurate and <span>\\(32\\times \\)</span> faster than the basic approximate and exact versions, respectively. Furthermore, we explore <i>ternary hypergraph motifs</i> that extends h-motifs by taking into account not only the presence but also the cardinality of intersections among hyperedges. This extension proves beneficial for all previously mentioned applications.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hypergraph motifs and their extensions beyond binary\",\"authors\":\"Geon Lee, Seokbum Yoon, Jihoon Ko, Hyunju Kim, Kijung Shin\",\"doi\":\"10.1007/s00778-023-00827-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Hypergraphs naturally represent group interactions, which are omnipresent in many domains: collaborations of researchers, co-purchases of items, and joint interactions of proteins, to name a few. In this work, we propose tools for answering the following questions in a systematic manner: (Q1) what are the structural design principles of real-world hypergraphs? (Q2) how can we compare local structures of hypergraphs of different sizes? (Q3) how can we identify domains from which hypergraphs are? We first define <i>hypergraph motifs</i> (h-motifs), which describe the overlapping patterns of three connected hyperedges. Then, we define the significance of each h-motif in a hypergraph as its occurrences relative to those in properly randomized hypergraphs. Lastly, we define the <i>characteristic profile</i> (CP) as the vector of the normalized significance of every h-motif. Regarding Q1, we find that h-motifs ’ occurrences in 11 real-world hypergraphs from 5 domains are clearly distinguished from those of randomized hypergraphs. In addition, we demonstrate that CPs capture local structural patterns unique to each domain, thus comparing CPs of hypergraphs addresses Q2 and Q3. The concept of CP is naturally extended to represent the connectivity pattern of each node or hyperedge as a vector, which proves useful in node classification and hyperedge prediction. Our algorithmic contribution is to propose <span>MoCHy</span>, a family of parallel algorithms for counting h-motifs ’ occurrences in a hypergraph. We theoretically analyze their speed and accuracy and show empirically that the advanced approximate version <span>MoCHy-A</span><span>\\\\(^{+}\\\\)</span> is up to <span>\\\\(25\\\\times \\\\)</span> more accurate and <span>\\\\(32\\\\times \\\\)</span> faster than the basic approximate and exact versions, respectively. Furthermore, we explore <i>ternary hypergraph motifs</i> that extends h-motifs by taking into account not only the presence but also the cardinality of intersections among hyperedges. This extension proves beneficial for all previously mentioned applications.</p>\",\"PeriodicalId\":501532,\"journal\":{\"name\":\"The VLDB Journal\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The VLDB Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00778-023-00827-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-023-00827-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hypergraph motifs and their extensions beyond binary
Hypergraphs naturally represent group interactions, which are omnipresent in many domains: collaborations of researchers, co-purchases of items, and joint interactions of proteins, to name a few. In this work, we propose tools for answering the following questions in a systematic manner: (Q1) what are the structural design principles of real-world hypergraphs? (Q2) how can we compare local structures of hypergraphs of different sizes? (Q3) how can we identify domains from which hypergraphs are? We first define hypergraph motifs (h-motifs), which describe the overlapping patterns of three connected hyperedges. Then, we define the significance of each h-motif in a hypergraph as its occurrences relative to those in properly randomized hypergraphs. Lastly, we define the characteristic profile (CP) as the vector of the normalized significance of every h-motif. Regarding Q1, we find that h-motifs ’ occurrences in 11 real-world hypergraphs from 5 domains are clearly distinguished from those of randomized hypergraphs. In addition, we demonstrate that CPs capture local structural patterns unique to each domain, thus comparing CPs of hypergraphs addresses Q2 and Q3. The concept of CP is naturally extended to represent the connectivity pattern of each node or hyperedge as a vector, which proves useful in node classification and hyperedge prediction. Our algorithmic contribution is to propose MoCHy, a family of parallel algorithms for counting h-motifs ’ occurrences in a hypergraph. We theoretically analyze their speed and accuracy and show empirically that the advanced approximate version MoCHy-A\(^{+}\) is up to \(25\times \) more accurate and \(32\times \) faster than the basic approximate and exact versions, respectively. Furthermore, we explore ternary hypergraph motifs that extends h-motifs by taking into account not only the presence but also the cardinality of intersections among hyperedges. This extension proves beneficial for all previously mentioned applications.