Scalable Representation Learning for Dynamic Heterogeneous Information Networks via Metagraphs

ACM Transactions on Information Systems (TOIS) Pub Date : 2022-03-10 DOI:10.1145/3485189

Yang Fang, Xiang Zhao, Peixin Huang, W. Xiao, M. de Rijke

{"title":"Scalable Representation Learning for Dynamic Heterogeneous Information Networks via Metagraphs","authors":"Yang Fang, Xiang Zhao, Peixin Huang, W. Xiao, M. de Rijke","doi":"10.1145/3485189","DOIUrl":null,"url":null,"abstract":"Content representation is a fundamental task in information retrieval. Representation learning is aimed at capturing features of an information object in a low-dimensional space. Most research on representation learning for heterogeneous information networks (HINs) focuses on static HINs. In practice, however, networks are dynamic and subject to constant change. In this article, we propose a novel and scalable representation learning model, M-DHIN, to explore the evolution of a dynamic HIN. We regard a dynamic HIN as a series of snapshots with different time stamps. We first use a static embedding method to learn the initial embeddings of a dynamic HIN at the first time stamp. We describe the features of the initial HIN via metagraphs, which retains more structural and semantic information than traditional path-oriented static models. We also adopt a complex embedding scheme to better distinguish between symmetric and asymmetric metagraphs. Unlike traditional models that process an entire network at each time stamp, we build a so-called change dataset that only includes nodes involved in a triadic closure or opening process, as well as newly added or deleted nodes. Then, we utilize the above metagraph-based mechanism to train on the change dataset. As a result of this setup, M-DHIN is scalable to large dynamic HINs since it only needs to model the entire HIN once while only the changed parts need to be processed over time. Existing dynamic embedding models only express the existing snapshots and cannot predict the future network structure. To equip M-DHIN with this ability, we introduce an LSTM-based deep autoencoder model that processes the evolution of the graph via an LSTM encoder and outputs the predicted graph. Finally, we evaluate the proposed model, M-DHIN, on real-life datasets and demonstrate that it significantly and consistently outperforms state-of-the-art models.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"49 1","pages":"1 - 27"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3485189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Content representation is a fundamental task in information retrieval. Representation learning is aimed at capturing features of an information object in a low-dimensional space. Most research on representation learning for heterogeneous information networks (HINs) focuses on static HINs. In practice, however, networks are dynamic and subject to constant change. In this article, we propose a novel and scalable representation learning model, M-DHIN, to explore the evolution of a dynamic HIN. We regard a dynamic HIN as a series of snapshots with different time stamps. We first use a static embedding method to learn the initial embeddings of a dynamic HIN at the first time stamp. We describe the features of the initial HIN via metagraphs, which retains more structural and semantic information than traditional path-oriented static models. We also adopt a complex embedding scheme to better distinguish between symmetric and asymmetric metagraphs. Unlike traditional models that process an entire network at each time stamp, we build a so-called change dataset that only includes nodes involved in a triadic closure or opening process, as well as newly added or deleted nodes. Then, we utilize the above metagraph-based mechanism to train on the change dataset. As a result of this setup, M-DHIN is scalable to large dynamic HINs since it only needs to model the entire HIN once while only the changed parts need to be processed over time. Existing dynamic embedding models only express the existing snapshots and cannot predict the future network structure. To equip M-DHIN with this ability, we introduce an LSTM-based deep autoencoder model that processes the evolution of the graph via an LSTM encoder and outputs the predicted graph. Finally, we evaluate the proposed model, M-DHIN, on real-life datasets and demonstrate that it significantly and consistently outperforms state-of-the-art models.

查看原文本刊更多论文

基于元图的动态异构信息网络的可扩展表示学习

内容表示是信息检索中的一项基本任务。表征学习的目的是在低维空间中捕捉信息对象的特征。异构信息网络表示学习的研究大多集中在静态异构信息网络上。然而，在实践中，网络是动态的，受到不断变化的影响。在本文中，我们提出了一种新颖的可扩展表示学习模型M-DHIN，以探索动态HIN的演变。我们认为动态HIN是一系列具有不同时间戳的快照。我们首先使用静态嵌入方法来学习动态HIN在第一个时间戳的初始嵌入。我们通过元图描述初始HIN的特征，它比传统的面向路径的静态模型保留了更多的结构和语义信息。我们还采用了一种复杂的嵌入方案来更好地区分对称和非对称元图。与在每个时间戳处理整个网络的传统模型不同，我们构建了一个所谓的变化数据集，该数据集仅包括涉及三元关闭或打开过程的节点，以及新添加或删除的节点。然后，我们利用上述基于元图的机制在变更数据集上进行训练。由于这种设置，M-DHIN可以扩展到大型动态HIN，因为它只需要对整个HIN建模一次，而只需要处理更改的部分。现有的动态嵌入模型只能表达现有的快照，不能预测未来的网络结构。为了使M-DHIN具备这种能力，我们引入了一个基于LSTM的深度自编码器模型，该模型通过LSTM编码器处理图的演化并输出预测图。最后，我们在实际数据集上评估了所提出的模型M-DHIN，并证明它显著且始终优于最先进的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Information Systems (TOIS)

自引率

0.00%

发文量