Renchi Yang, Jieming Shi, X. Xiao, Yin Yang, S. Bhowmick, Juncheng Liu
{"title":"No PANE, No Gain","authors":"Renchi Yang, Jieming Shi, X. Xiao, Yin Yang, S. Bhowmick, Juncheng Liu","doi":"10.1145/3542700.3542711","DOIUrl":null,"url":null,"abstract":"Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node v 2 G to a compact vector Xv, which can be used in downstream machine learning tasks in a variety of applications. Existing ANE solutions do not scale to massive graphs due to prohibitive computation costs or generation of low-quality embeddings. This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs in a single server that achieves state-of-the-art result quality on multiple benchmark datasets for two common prediction tasks: link prediction and node classification. Under the hood, PANE takes inspiration from well-established data management techniques to scale up ANE in a single server. Specifically, it exploits a carefully formulated problem based on a novel random walk model, a highly efficient solver, and non-trivial parallelization by utilizing modern multi-core CPUs. Extensive experiments demonstrate that PANE consistently outperforms all existing methods in terms of result quality, while being orders of magnitude faster.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGMOD Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3542700.3542711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node v 2 G to a compact vector Xv, which can be used in downstream machine learning tasks in a variety of applications. Existing ANE solutions do not scale to massive graphs due to prohibitive computation costs or generation of low-quality embeddings. This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs in a single server that achieves state-of-the-art result quality on multiple benchmark datasets for two common prediction tasks: link prediction and node classification. Under the hood, PANE takes inspiration from well-established data management techniques to scale up ANE in a single server. Specifically, it exploits a carefully formulated problem based on a novel random walk model, a highly efficient solver, and non-trivial parallelization by utilizing modern multi-core CPUs. Extensive experiments demonstrate that PANE consistently outperforms all existing methods in terms of result quality, while being orders of magnitude faster.