Knowledge Base Enhanced Topic Modeling

Dandan Song, Jingwen Gao, Jinhui Pang, L. Liao, Lifei Qin
{"title":"Knowledge Base Enhanced Topic Modeling","authors":"Dandan Song, Jingwen Gao, Jinhui Pang, L. Liao, Lifei Qin","doi":"10.1109/ICBK50248.2020.00061","DOIUrl":null,"url":null,"abstract":"Topic models, such as Latent Dirichlet Allocation (LDA), are successful in learning hidden topics and has been widely applied in text mining. There are many recently developed augmented topic modeling methods to utilize metadata information. However, the effect of topic models is still not comparable to humans. We think one key point is that humans have background knowledge, which is essential for topic understanding. Inspired by this, we propose a knowledge base enhanced topic model in this paper. We take knowledge bases as good presentations of human knowledge, with huge collections of entities and their relations. We assume that documents with related entities tend to have similar topic distributions. Based on this assumption, we compute document similarity information via the linked entities and then use it as a constraint for LDA. More specifically, we embed entities in a low-dimensional space via DeepWalk and use Entity Movers Distance to efficiently and effectively measure the similarities between documents. The results of experiments over two real-world datasets show that our method boosts the LDA model on the document classification while no supervision information is needed.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Knowledge Graph (ICKG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK50248.2020.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Topic models, such as Latent Dirichlet Allocation (LDA), are successful in learning hidden topics and has been widely applied in text mining. There are many recently developed augmented topic modeling methods to utilize metadata information. However, the effect of topic models is still not comparable to humans. We think one key point is that humans have background knowledge, which is essential for topic understanding. Inspired by this, we propose a knowledge base enhanced topic model in this paper. We take knowledge bases as good presentations of human knowledge, with huge collections of entities and their relations. We assume that documents with related entities tend to have similar topic distributions. Based on this assumption, we compute document similarity information via the linked entities and then use it as a constraint for LDA. More specifically, we embed entities in a low-dimensional space via DeepWalk and use Entity Movers Distance to efficiently and effectively measure the similarities between documents. The results of experiments over two real-world datasets show that our method boosts the LDA model on the document classification while no supervision information is needed.
知识库增强的主题建模
潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)等主题模型在学习隐藏主题方面取得了成功,在文本挖掘中得到了广泛的应用。最近开发了许多利用元数据信息的增强主题建模方法。然而,主题模型的效果仍然无法与人类相比。我们认为一个关键点是人类有背景知识,这对于理解主题是必不可少的。受此启发,本文提出了一种知识库增强的主题模型。我们把知识库看作是人类知识的良好展示,知识库中有大量的实体和它们之间的关系。我们假设具有相关实体的文档往往具有相似的主题分布。基于这个假设,我们通过链接实体计算文档相似度信息,然后将其用作LDA的约束。更具体地说,我们通过DeepWalk将实体嵌入到低维空间中,并使用实体移动距离来高效地测量文档之间的相似性。在两个真实数据集上的实验结果表明,我们的方法在不需要监督信息的情况下增强了LDA模型对文档分类的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信