A hashtag recommendation system for twitter data streams.

Q1 Mathematics

Computational Social Networks Pub Date : 2016-01-01 Epub Date: 2016-05-31 DOI:10.1186/s40649-016-0028-9

Eriko Otsuka, Scott A Wallace, David Chiu

{"title":"A hashtag recommendation system for twitter data streams.","authors":"Eriko Otsuka, Scott A Wallace, David Chiu","doi":"10.1186/s40649-016-0028-9","DOIUrl":null,"url":null,"abstract":"Background: Twitter has evolved into a powerful communication and information sharing tool used by millions of people around the world to post what is happening now. A hashtag, a keyword prefixed with a hash symbol (#), is a feature in Twitter to organize tweets and facilitate effective search among a massive volume of data. In this paper, we propose an automatic hashtag recommendation system that helps users find new hashtags related to their interests on-demand.Methods: For hashtag ranking, we propose the Hashtag Frequency-Inverse Hashtag Ubiquity (HF-IHU) ranking scheme, which is a variation of the well-known TF-IDF, that considers hashtag relevancy, as well as data sparseness which is one of the key challenges in analyzing microblog data. Our system is built on top of Hadoop, a leading platform for distributed computing, to provide scalable performance using Map-Reduce. Experiments on a large Twitter data set demonstrate that our method successfully yields relevant hashtags for user's interest and that recommendations are more stable and reliable than ranking tags based on tweet content similarity.Results and conclusions: Our results show that HF-IHU can achieve over 30 % hashtag recall when asked to identify the top 10 relevant hashtags for a particular tweet. Furthermore, our method out-performs kNN, k-popularity, and Naïve Bayes by 69, 54, and 17 %, respectively, on recall of the top 200 hashtags.","PeriodicalId":52145,"journal":{"name":"Computational Social Networks","volume":"3 1","pages":"3"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s40649-016-0028-9","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Social Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40649-016-0028-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/5/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 25

Abstract

Background: Twitter has evolved into a powerful communication and information sharing tool used by millions of people around the world to post what is happening now. A hashtag, a keyword prefixed with a hash symbol (#), is a feature in Twitter to organize tweets and facilitate effective search among a massive volume of data. In this paper, we propose an automatic hashtag recommendation system that helps users find new hashtags related to their interests on-demand.

Methods: For hashtag ranking, we propose the Hashtag Frequency-Inverse Hashtag Ubiquity (HF-IHU) ranking scheme, which is a variation of the well-known TF-IDF, that considers hashtag relevancy, as well as data sparseness which is one of the key challenges in analyzing microblog data. Our system is built on top of Hadoop, a leading platform for distributed computing, to provide scalable performance using Map-Reduce. Experiments on a large Twitter data set demonstrate that our method successfully yields relevant hashtags for user's interest and that recommendations are more stable and reliable than ranking tags based on tweet content similarity.

Results and conclusions: Our results show that HF-IHU can achieve over 30 % hashtag recall when asked to identify the top 10 relevant hashtags for a particular tweet. Furthermore, our method out-performs kNN, k-popularity, and Naïve Bayes by 69, 54, and 17 %, respectively, on recall of the top 200 hashtags.

Abstract Image

查看原文本刊更多论文

twitter数据流的标签推荐系统。

背景:Twitter已经发展成为一个强大的通信和信息共享工具，全世界数百万人使用它来发布正在发生的事情。hashtag (hashtag)是Twitter的一项功能，用于组织tweet，并促进在大量数据中进行有效搜索。在本文中，我们提出了一个自动标签推荐系统，可以帮助用户按需找到与他们的兴趣相关的新标签。方法:对于标签排名，我们提出了标签频率-反标签无处不在(HF-IHU)排名方案，该方案是众所周知的TF-IDF的一种变化，它考虑了标签相关性，以及数据稀疏性，这是分析微博数据的关键挑战之一。我们的系统建立在Hadoop(分布式计算的领先平台)之上，使用Map-Reduce提供可扩展的性能。在大型Twitter数据集上的实验表明，我们的方法成功地生成了用户感兴趣的相关标签，并且推荐比基于tweet内容相似性的排名标签更稳定和可靠。结果和结论:我们的结果表明，当被要求识别特定tweet的前10个相关标签时，HF-IHU可以实现超过30%的标签召回。此外，我们的方法在前200个标签的召回率上分别比kNN, k-popularity和Naïve贝叶斯高出69%，54%和17%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Social Networks Mathematics-Modeling and Simulation

自引率

0.00%

发文量

审稿时长

13 weeks

期刊介绍： Computational Social Networks showcases refereed papers dealing with all mathematical, computational and applied aspects of social computing. The objective of this journal is to advance and promote the theoretical foundation, mathematical aspects, and applications of social computing. Submissions are welcome which focus on common principles, algorithms and tools that govern network structures/topologies, network functionalities, security and privacy, network behaviors, information diffusions and influence, social recommendation systems which are applicable to all types of social networks and social media. Topics include (but are not limited to) the following: -Social network design and architecture -Mathematical modeling and analysis -Real-world complex networks -Information retrieval in social contexts, political analysts -Network structure analysis -Network dynamics optimization -Complex network robustness and vulnerability -Information diffusion models and analysis -Security and privacy -Searching in complex networks -Efficient algorithms -Network behaviors -Trust and reputation -Social Influence -Social Recommendation -Social media analysis -Big data analysis on online social networks This journal publishes rigorously refereed papers dealing with all mathematical, computational and applied aspects of social computing. The journal also includes reviews of appropriate books as special issues on hot topics.