No PANE, No Gain

ACM SIGMOD Record Pub Date : 2022-05-31 DOI:10.1145/3542700.3542711

Renchi Yang, Jieming Shi, X. Xiao, Yin Yang, S. Bhowmick, Juncheng Liu

引用次数: 2

Abstract

Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node v 2 G to a compact vector Xv, which can be used in downstream machine learning tasks in a variety of applications. Existing ANE solutions do not scale to massive graphs due to prohibitive computation costs or generation of low-quality embeddings. This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs in a single server that achieves state-of-the-art result quality on multiple benchmark datasets for two common prediction tasks: link prediction and node classification. Under the hood, PANE takes inspiration from well-established data management techniques to scale up ANE in a single server. Specifically, it exploits a carefully formulated problem based on a novel random walk model, a highly efficient solver, and non-trivial parallelization by utilizing modern multi-core CPUs. Extensive experiments demonstrate that PANE consistently outperforms all existing methods in terms of result quality, while being orders of magnitude faster.

查看原文本刊更多论文

不付出就没有收获

给定一个图G，其中每个节点与一组属性相关联，属性网络嵌入(ANE)将每个节点v2g映射到一个紧凑向量Xv，这可以用于各种应用中的下游机器学习任务。由于高昂的计算成本或生成低质量的嵌入，现有的ANE解决方案无法扩展到大量图形。本文提出了PANE，这是一种有效且可扩展的方法，用于在单个服务器上对大量图形进行ANE计算，可以在多个基准数据集上实现最先进的结果质量，用于两个常见的预测任务:链接预测和节点分类。在底层，PANE从完善的数据管理技术中获得灵感，在单个服务器中扩展ANE。具体来说，它利用了一个精心制定的问题，该问题基于一种新颖的随机行走模型，一个高效的求解器，并利用现代多核cpu进行非平凡的并行化。大量的实验表明，PANE在结果质量方面始终优于所有现有的方法，同时速度要快几个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM SIGMOD Record

自引率

0.00%

发文量