NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization

The World Wide Web Conference Pub Date : 2019-05-13 DOI:10.1145/3308558.3313446

J. Qiu, Yuxiao Dong, Hao Ma, Jun Yu Li, Chi Wang, Kuansan Wang, Jie Tang

{"title":"NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization","authors":"J. Qiu, Yuxiao Dong, Hao Ma, Jun Yu Li, Chi Wang, Kuansan Wang, Jie Tang","doi":"10.1145/3308558.3313446","DOIUrl":null,"url":null,"abstract":"We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2) the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix-which is dense-is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available1.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"143","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313446","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 143

Abstract

We study the problem of large-scale network embedding, which aims to learn latent representations for network mining applications. Previous research shows that 1) popular network embedding benchmarks, such as DeepWalk, are in essence implicitly factorizing a matrix with a closed form, and 2) the explicit factorization of such matrix generates more powerful embeddings than existing methods. However, directly constructing and factorizing this matrix-which is dense-is prohibitively expensive in terms of both time and space, making it not scalable for large networks. In this work, we present the algorithm of large-scale network embedding as sparse matrix factorization (NetSMF). NetSMF leverages theories from spectral sparsification to efficiently sparsify the aforementioned dense matrix, enabling significantly improved efficiency in embedding learning. The sparsified matrix is spectrally close to the original dense one with a theoretically bounded approximation error, which helps maintain the representation power of the learned embeddings. We conduct experiments on networks of various scales and types. Results show that among both popular benchmarks and factorization based methods, NetSMF is the only method that achieves both high efficiency and effectiveness. We show that NetSMF requires only 24 hours to generate effective embeddings for a large-scale academic collaboration network with tens of millions of nodes, while it would cost DeepWalk months and is computationally infeasible for the dense matrix factorization solution. The source code of NetSMF is publicly available1.

查看原文本刊更多论文

基于稀疏矩阵分解的大规模网络嵌入

我们研究了大规模网络嵌入问题，旨在学习网络挖掘应用的潜在表示。先前的研究表明，1)流行的网络嵌入基准，如DeepWalk，本质上是隐式分解具有封闭形式的矩阵，2)这种矩阵的显式分解产生比现有方法更强大的嵌入。然而，直接构造和分解这个矩阵——它是密集的——在时间和空间上都是非常昂贵的，使得它不能用于大型网络。在这项工作中，我们提出了大规模网络嵌入的稀疏矩阵分解算法(NetSMF)。NetSMF利用谱稀疏化理论有效地稀疏了上述密集矩阵，从而显著提高了嵌入学习的效率。稀疏化后的矩阵在谱上接近原始密集矩阵，具有理论上有界的近似误差，这有助于保持学习到的嵌入的表示能力。我们在各种规模和类型的网络上进行实验。结果表明，在常用的基准测试方法和基于因子分解的方法中，NetSMF是唯一既高效又有效的方法。我们表明，NetSMF只需要24小时就可以为具有数千万个节点的大规模学术协作网络生成有效的嵌入，而这将花费DeepWalk数月的时间，并且对于密集矩阵分解解决方案在计算上是不可行的。NetSMF的源代码是公开的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The World Wide Web Conference

自引率

0.00%

发文量