Dandan Song, Jingwen Gao, Jinhui Pang, L. Liao, Lifei Qin
{"title":"Knowledge Base Enhanced Topic Modeling","authors":"Dandan Song, Jingwen Gao, Jinhui Pang, L. Liao, Lifei Qin","doi":"10.1109/ICBK50248.2020.00061","DOIUrl":null,"url":null,"abstract":"Topic models, such as Latent Dirichlet Allocation (LDA), are successful in learning hidden topics and has been widely applied in text mining. There are many recently developed augmented topic modeling methods to utilize metadata information. However, the effect of topic models is still not comparable to humans. We think one key point is that humans have background knowledge, which is essential for topic understanding. Inspired by this, we propose a knowledge base enhanced topic model in this paper. We take knowledge bases as good presentations of human knowledge, with huge collections of entities and their relations. We assume that documents with related entities tend to have similar topic distributions. Based on this assumption, we compute document similarity information via the linked entities and then use it as a constraint for LDA. More specifically, we embed entities in a low-dimensional space via DeepWalk and use Entity Movers Distance to efficiently and effectively measure the similarities between documents. The results of experiments over two real-world datasets show that our method boosts the LDA model on the document classification while no supervision information is needed.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Knowledge Graph (ICKG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK50248.2020.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Topic models, such as Latent Dirichlet Allocation (LDA), are successful in learning hidden topics and has been widely applied in text mining. There are many recently developed augmented topic modeling methods to utilize metadata information. However, the effect of topic models is still not comparable to humans. We think one key point is that humans have background knowledge, which is essential for topic understanding. Inspired by this, we propose a knowledge base enhanced topic model in this paper. We take knowledge bases as good presentations of human knowledge, with huge collections of entities and their relations. We assume that documents with related entities tend to have similar topic distributions. Based on this assumption, we compute document similarity information via the linked entities and then use it as a constraint for LDA. More specifically, we embed entities in a low-dimensional space via DeepWalk and use Entity Movers Distance to efficiently and effectively measure the similarities between documents. The results of experiments over two real-world datasets show that our method boosts the LDA model on the document classification while no supervision information is needed.