On Quantifying Predictability in Online Social Media Cascades Using Entropy

Naimisha Kolli, N. Balakrishnan, K. Ramakrishnan
{"title":"On Quantifying Predictability in Online Social Media Cascades Using Entropy","authors":"Naimisha Kolli, N. Balakrishnan, K. Ramakrishnan","doi":"10.1145/3110025.3110071","DOIUrl":null,"url":null,"abstract":"Predicting cascade volumes in social media communication is an important topic in furthering the use of social media for viral marketing, impact of political campaigns and in home-land security. Several techniques have been reported in the literature to estimate the cascade volumes. These algorithms use a variety of information such as Content, Structural and Temporal features, depending on their availability. Due to the spread of information infused into the algorithms the prediction accuracy has been shown in the literature to be different for different algorithms. Entropy based measures that are tailored for the differing situations of information availability have been successfully applied in the prediction scenarios in many fields including network traffic, human mobility and radio spectrum state dynamics as well as in atmospheric science. In this paper we adopt a multitude of entropy based measures for quantifying the predictability of cascade volumes in online social media communications. The limit derived from the entropy measures discussed in this paper has also been used to explain the difference in accuracies of some of the algorithms for cascade volume predictions reported in the literature. For the purpose of illustration and to demonstrate the utility of the entropy based predictability limits we have used two data sets, the MemeTracker dataset and Twitter Hashtags dataset. The results obtained in this paper demonstrate clearly the utility of entropy based measures for quantifying the predictability in online social media cascades. We have also shown that temporal relevancy is a dominant contributing factor in cascade predictability and how additional features such as the knowledge of a small number of large media sites and blogs can have significant influence on the prediction performance.","PeriodicalId":399660,"journal":{"name":"Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3110025.3110071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Predicting cascade volumes in social media communication is an important topic in furthering the use of social media for viral marketing, impact of political campaigns and in home-land security. Several techniques have been reported in the literature to estimate the cascade volumes. These algorithms use a variety of information such as Content, Structural and Temporal features, depending on their availability. Due to the spread of information infused into the algorithms the prediction accuracy has been shown in the literature to be different for different algorithms. Entropy based measures that are tailored for the differing situations of information availability have been successfully applied in the prediction scenarios in many fields including network traffic, human mobility and radio spectrum state dynamics as well as in atmospheric science. In this paper we adopt a multitude of entropy based measures for quantifying the predictability of cascade volumes in online social media communications. The limit derived from the entropy measures discussed in this paper has also been used to explain the difference in accuracies of some of the algorithms for cascade volume predictions reported in the literature. For the purpose of illustration and to demonstrate the utility of the entropy based predictability limits we have used two data sets, the MemeTracker dataset and Twitter Hashtags dataset. The results obtained in this paper demonstrate clearly the utility of entropy based measures for quantifying the predictability in online social media cascades. We have also shown that temporal relevancy is a dominant contributing factor in cascade predictability and how additional features such as the knowledge of a small number of large media sites and blogs can have significant influence on the prediction performance.
利用熵量化在线社交媒体级联中的可预测性
预测社交媒体传播的级联量是进一步利用社交媒体进行病毒式营销、政治竞选影响和国土安全的一个重要课题。文献中已经报道了几种估计级联体积的技术。这些算法使用各种信息,如内容、结构和时间特征,这取决于它们的可用性。由于输入到算法中的信息的广泛性,文献表明不同算法的预测精度不同。针对信息可用性的不同情况量身定制的基于熵的度量已经成功地应用于许多领域的预测场景,包括网络流量、人类流动性和无线电频谱状态动力学以及大气科学。在本文中,我们采用了大量基于熵的措施来量化在线社交媒体通信中级联量的可预测性。从本文讨论的熵测度中得出的极限也被用来解释文献中报道的一些级联体积预测算法的精度差异。为了说明和演示基于熵的可预测性限制的效用,我们使用了两个数据集,MemeTracker数据集和Twitter Hashtags数据集。本文获得的结果清楚地证明了基于熵的方法在量化在线社交媒体级联的可预测性方面的效用。我们还表明,时间相关性是级联可预测性的主要贡献因素,以及诸如少数大型媒体网站和博客的知识等附加特征如何对预测性能产生重大影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信