{"title":"Random Walk-Based Top-k Tag Generation in Bipartite Networks of Entity-Term Type","authors":"Mingxi Zhang, Guanying Su, Wei Wang","doi":"10.1109/ICTAI.2019.00026","DOIUrl":null,"url":null,"abstract":"Tag generation aims to find relevant tags for a given entity, which has numerous applications, such as classification, information retrieval and recommender system. Practically, the data of real applications is sparse and lacks sufficient description for entities, which might lead to incomprehensive results. Random walk with restart (RWR) can find the hidden relationship between nodes by utilizing indirect connections. However, the traditional RWR computation is based on the whole structure of the given network, which maintains a matrix for storing all relevances between nodes. And the efficiency problem would be run into as network grows large. In this paper, we propose a top-k tag generation algorithm, namely DRWR, for efficiently generating the tags from entity-term network. The terms are treated as candidate tags, and the most relevant terms are treated as the tags for a given entity. The relevance computation between entity and terms is divided into two stages: off-line stage and on-line stage. In off-line stage, the relevances between terms are computed over the term-term network that is built based on the whole structure of entity-term network. In on-line stage, the relevances between entity and each term are computed based on the relevances between terms. For supporting fast on-line query processing, we develop a pruning algorithm, which skips the operations on relevances between terms smaller than a threshold. Extensive experiments on real datasets demonstrate the efficiency and effectiveness of the proposed approach.","PeriodicalId":346657,"journal":{"name":"2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2019.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Tag generation aims to find relevant tags for a given entity, which has numerous applications, such as classification, information retrieval and recommender system. Practically, the data of real applications is sparse and lacks sufficient description for entities, which might lead to incomprehensive results. Random walk with restart (RWR) can find the hidden relationship between nodes by utilizing indirect connections. However, the traditional RWR computation is based on the whole structure of the given network, which maintains a matrix for storing all relevances between nodes. And the efficiency problem would be run into as network grows large. In this paper, we propose a top-k tag generation algorithm, namely DRWR, for efficiently generating the tags from entity-term network. The terms are treated as candidate tags, and the most relevant terms are treated as the tags for a given entity. The relevance computation between entity and terms is divided into two stages: off-line stage and on-line stage. In off-line stage, the relevances between terms are computed over the term-term network that is built based on the whole structure of entity-term network. In on-line stage, the relevances between entity and each term are computed based on the relevances between terms. For supporting fast on-line query processing, we develop a pruning algorithm, which skips the operations on relevances between terms smaller than a threshold. Extensive experiments on real datasets demonstrate the efficiency and effectiveness of the proposed approach.
标签生成的目的是为给定的实体找到相关的标签,在分类、信息检索和推荐系统等方面有着广泛的应用。实际应用中的数据是稀疏的,缺乏对实体的充分描述,可能导致结果不全面。RWR (Random walk with restart)可以利用间接连接来发现节点之间隐藏的关系。然而,传统的RWR计算是基于给定网络的整体结构,它维护一个矩阵来存储节点之间的所有相关性。随着网络规模的扩大,效率问题也会出现。为了有效地从实体术语网络中生成标签,我们提出了一种top-k标签生成算法,即DRWR。这些术语被视为候选标记,最相关的术语被视为给定实体的标记。实体与术语之间的关联计算分为离线和在线两个阶段。在离线阶段,基于实体-术语网络的整体结构构建术语-术语网络,计算术语之间的相关性。在在线阶段,根据词之间的关联度计算实体与各词之间的关联度。为了支持快速在线查询处理,我们开发了一种剪枝算法,该算法跳过了小于阈值的项之间的相关性操作。在实际数据集上的大量实验证明了该方法的效率和有效性。