A Self-Aggregated Hierarchical Topic Model for Short Texts

Yue Niu, Hongjie Zhang
{"title":"A Self-Aggregated Hierarchical Topic Model for Short Texts","authors":"Yue Niu, Hongjie Zhang","doi":"10.5121/csit.2021.111212","DOIUrl":null,"url":null,"abstract":"With the growth of the internet, short texts such as tweets from Twitter, news titles from the RSS, or comments from Amazon have become very prevalent. Many tasks need to retrieve information hidden from the content of short texts. So ontology learning methods are proposed for retrieving structured information. Topic hierarchy is a typical ontology that consists of concepts and taxonomy relations between concepts. Current hierarchical topic models are not specially designed for short texts. These methods use word co-occurrence to construct concepts and general-special word relations to construct taxonomy topics. But in short texts, word cooccurrence is sparse and lacking general-special word relations. To overcome this two problems and provide an interpretable result, we designed a hierarchical topic model which aggregates short texts into long documents and constructing topics and relations. Because long documents add additional semantic information, our model can avoid the sparsity of word cooccurrence. In experiments, we measured the quality of concepts by topic coherence metric on four real-world short texts corpus. The result showed that our topic hierarchy is more interpretable than other methods.","PeriodicalId":347682,"journal":{"name":"Machine Learning, IOT and Blockchain Technologies & Trends","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning, IOT and Blockchain Technologies & Trends","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/csit.2021.111212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the growth of the internet, short texts such as tweets from Twitter, news titles from the RSS, or comments from Amazon have become very prevalent. Many tasks need to retrieve information hidden from the content of short texts. So ontology learning methods are proposed for retrieving structured information. Topic hierarchy is a typical ontology that consists of concepts and taxonomy relations between concepts. Current hierarchical topic models are not specially designed for short texts. These methods use word co-occurrence to construct concepts and general-special word relations to construct taxonomy topics. But in short texts, word cooccurrence is sparse and lacking general-special word relations. To overcome this two problems and provide an interpretable result, we designed a hierarchical topic model which aggregates short texts into long documents and constructing topics and relations. Because long documents add additional semantic information, our model can avoid the sparsity of word cooccurrence. In experiments, we measured the quality of concepts by topic coherence metric on four real-world short texts corpus. The result showed that our topic hierarchy is more interpretable than other methods.
短文本的自聚合层次主题模型
随着互联网的发展,像Twitter上的推文、RSS上的新闻标题或亚马逊上的评论这样的短文本已经变得非常普遍。许多任务需要检索隐藏在短文本内容中的信息。为此,提出了基于本体学习的结构化信息检索方法。主题层次是一种典型的本体,由概念和概念间的分类关系组成。当前的分层主题模型并不是专门为短文本设计的。这些方法使用词共现来构造概念,使用一般-特殊词关系来构造分类主题。但在短文本中,词的共现是稀疏的,缺乏一般-特殊词的关系。为了克服这两个问题并提供一个可解释的结果,我们设计了一个分层主题模型,该模型将短文本聚合成长文档,并构建主题和关系。由于长文档增加了额外的语义信息,我们的模型可以避免单词协同的稀疏性。在实验中,我们在四个真实世界短文本语料库上使用主题相干度量来衡量概念的质量。结果表明,我们的主题层次结构比其他方法具有更好的可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信