Topic model with constrainted word burstiness intensities

Shaoze Lei, Jianwen Zhang, Shifeng Weng, Changshui Zhang
{"title":"Topic model with constrainted word burstiness intensities","authors":"Shaoze Lei, Jianwen Zhang, Shifeng Weng, Changshui Zhang","doi":"10.1109/IJCNN.2011.6033201","DOIUrl":null,"url":null,"abstract":"Word burstiness phenomenon, which means that if a word occurs once in a document it is likely to occur repeatedly, has interested the text analysis field recently. Dirichlet Compound Multinomial Latent Dirichlet Allocation (DCMLDA) introduces this word burstiness mechanism into Latent Dirichlet Allocation (LDA). However, in DCMLDA, there is no restriction on the word burstiness intensity of each topic. Consequently, as shown in this paper, the burstiness intensities of words in major topics will become extremely low and the topics' ability to represent different semantic meanings will be impaired. In order to get topics that represent semantic meanings of documents well, we introduce constraints on topics' word burstiness intensities. Experiments demonstrate that DCMLDA with constrained word burstiness intensities achieves better performance than the original one without constraints. Besides, these additional constraints help to reveal the relationship between two key properties inherited from DCM and LDA respectively. These two properties have a great influence on the combined model's performance and their relationship revealed by this paper is an important guidance for further study of topic models.","PeriodicalId":415833,"journal":{"name":"The 2011 International Joint Conference on Neural Networks","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2011 International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2011.6033201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Word burstiness phenomenon, which means that if a word occurs once in a document it is likely to occur repeatedly, has interested the text analysis field recently. Dirichlet Compound Multinomial Latent Dirichlet Allocation (DCMLDA) introduces this word burstiness mechanism into Latent Dirichlet Allocation (LDA). However, in DCMLDA, there is no restriction on the word burstiness intensity of each topic. Consequently, as shown in this paper, the burstiness intensities of words in major topics will become extremely low and the topics' ability to represent different semantic meanings will be impaired. In order to get topics that represent semantic meanings of documents well, we introduce constraints on topics' word burstiness intensities. Experiments demonstrate that DCMLDA with constrained word burstiness intensities achieves better performance than the original one without constraints. Besides, these additional constraints help to reveal the relationship between two key properties inherited from DCM and LDA respectively. These two properties have a great influence on the combined model's performance and their relationship revealed by this paper is an important guidance for further study of topic models.
具有约束突发性词强度的主题模型
突发性词现象是指一个词在一个文档中出现一次就有可能重复出现的现象,它最近引起了文本分析领域的兴趣。Dirichlet复合多项潜狄利克雷分配(DCMLDA)将这种词突发性机制引入到潜狄利克雷分配(LDA)中。然而,在DCMLDA中,对每个主题的突发性单词强度没有限制。因此,如本文所示,主要话题中单词的爆发强度会变得极低,话题表达不同语义的能力会受到损害。为了得到能很好地表达文档语义的主题,我们对主题的突然性强度进行了约束。实验表明,带有约束词爆发强度的DCMLDA比没有约束的DCMLDA具有更好的性能。此外,这些附加约束有助于揭示分别从DCM和LDA继承的两个关键属性之间的关系。这两个属性对组合模型的性能影响很大,本文揭示的它们之间的关系对主题模型的进一步研究具有重要的指导意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信