Finding High-Level Topics and Tweet Labeling Using Topic Models

Sameendra Samarawickrama, S. Karunasekera, A. Harwood
{"title":"Finding High-Level Topics and Tweet Labeling Using Topic Models","authors":"Sameendra Samarawickrama, S. Karunasekera, A. Harwood","doi":"10.1109/ICPADS.2015.38","DOIUrl":null,"url":null,"abstract":"Making sense of Twitter data streams is challenging due to the extremely high volume of data. One way to address this challenge is to consider these data streams as containing a set of high-level topics. In this research we address the problem of: given a collection of tweets about K high-level topics, how to find topic words that describe these topics as well as how to label each tweet with one of the K topics using a topic modeling approach. Current research has shown that applying topic modeling algorithms directly on tweets does not lead to good results. Hence one approach is to group related tweets together, so as to form a single “pseudo-document”, which is more informative than a single tweet. In this paper we evaluate different grouping schemes found in the literature and propose a new grouping scheme utilizing named entities and word collocations. Results show that our proposed scheme performs better than the existing approaches, to a some extent for all the test cases, and for both finding high-level topics and tweet labeling.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.2015.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Making sense of Twitter data streams is challenging due to the extremely high volume of data. One way to address this challenge is to consider these data streams as containing a set of high-level topics. In this research we address the problem of: given a collection of tweets about K high-level topics, how to find topic words that describe these topics as well as how to label each tweet with one of the K topics using a topic modeling approach. Current research has shown that applying topic modeling algorithms directly on tweets does not lead to good results. Hence one approach is to group related tweets together, so as to form a single “pseudo-document”, which is more informative than a single tweet. In this paper we evaluate different grouping schemes found in the literature and propose a new grouping scheme utilizing named entities and word collocations. Results show that our proposed scheme performs better than the existing approaches, to a some extent for all the test cases, and for both finding high-level topics and tweet labeling.
使用主题模型查找高级主题和Tweet标签
由于数据量非常大,理解Twitter数据流是一项挑战。解决这一挑战的一种方法是将这些数据流视为包含一组高级主题。在本研究中,我们解决了以下问题:给定关于K个高级主题的推文集合,如何找到描述这些主题的主题词,以及如何使用主题建模方法将K个主题中的一个标记为每个推文。目前的研究表明,将主题建模算法直接应用到tweets上并不会产生很好的效果。因此,一种方法是将相关的推文组合在一起,形成一个单一的“伪文档”,这比单个推文的信息量更大。在本文中,我们评估了文献中发现的不同分组方案,并提出了一种利用命名实体和单词搭配的新分组方案。结果表明,我们提出的方案在某种程度上优于现有的方法,在所有测试用例中,以及在寻找高级主题和tweet标记方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信