Extracting Topical Information of Tweets Using Hashtags

Z. Alp, Ş. Öğüdücü
{"title":"Extracting Topical Information of Tweets Using Hashtags","authors":"Z. Alp, Ş. Öğüdücü","doi":"10.1109/ICMLA.2015.73","DOIUrl":null,"url":null,"abstract":"Twitter is one of the largest micro blogging web sites where users share news, their opinions, moods, recommendations by posting text messages, and it is mostly used like a news media. Since the data being shared via Twitter is vast, many researches are focusing on extracting meaningful information with the help of information retrieval systems. Retrieving meaningful information from social media applications became important for several tasks such as sentiment analysis, detecting anomalies, and recommendation systems. Topic modeling is one of the mostly studied and hard problems in information retrieval area, and it is even more challenging to model topics when the documents are too short such as tweets. In this paper, we focus on developing an effective and efficient method to overcome this challenge of tweets being too short for topic modeling. We compare different topic modeling schemes, one of which is not studied before, based on Latent Dirichlet Allocation (LDA) that merges tweets in order to improve LDA performance. We also demonstrate our experimental results with unbiased data collection and evaluation methodologies.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Twitter is one of the largest micro blogging web sites where users share news, their opinions, moods, recommendations by posting text messages, and it is mostly used like a news media. Since the data being shared via Twitter is vast, many researches are focusing on extracting meaningful information with the help of information retrieval systems. Retrieving meaningful information from social media applications became important for several tasks such as sentiment analysis, detecting anomalies, and recommendation systems. Topic modeling is one of the mostly studied and hard problems in information retrieval area, and it is even more challenging to model topics when the documents are too short such as tweets. In this paper, we focus on developing an effective and efficient method to overcome this challenge of tweets being too short for topic modeling. We compare different topic modeling schemes, one of which is not studied before, based on Latent Dirichlet Allocation (LDA) that merges tweets in order to improve LDA performance. We also demonstrate our experimental results with unbiased data collection and evaluation methodologies.
利用标签提取推文的主题信息
Twitter是最大的微博客网站之一,用户可以通过发布短信来分享新闻、观点、情绪和建议,它主要被用作新闻媒体。由于通过Twitter共享的数据是巨大的,许多研究都集中在利用信息检索系统提取有意义的信息上。从社交媒体应用程序中检索有意义的信息对于情感分析、检测异常和推荐系统等任务变得非常重要。主题建模是信息检索领域研究最多也是最难的问题之一,当文档过短(如tweets)时,对主题的建模更具挑战性。在本文中,我们专注于开发一种有效的方法来克服推文太短而无法进行主题建模的挑战。我们比较了不同的主题建模方案,其中一种以前没有研究过,基于潜狄利克雷分配(Latent Dirichlet Allocation, LDA)来合并推文,以提高LDA的性能。我们还用公正的数据收集和评估方法证明了我们的实验结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信