利用方差正则化EM算法学习信息扩散概率

2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014) Pub Date : 2014-08-17 DOI:10.1109/ASONAM.2014.6921596

Hai-Guang Li, Tianyu Cao, Zhao Li

{"title":"利用方差正则化EM算法学习信息扩散概率","authors":"Hai-Guang Li, Tianyu Cao, Zhao Li","doi":"10.1109/ASONAM.2014.6921596","DOIUrl":null,"url":null,"abstract":"In this paper we address the problem of learning the information diffusion probabilities when there is no sufficient data of information diffusion. By observing the information diffusion behavior on the popular social network web-site Twitter, we find that the evidence of information diffusion is extremely sparse. Less than one percent of tweets are retweeted, which is considered as the most important form of information diffusion evidence on Twitter. Previous research on predicting information diffusion probabilities has failed under such scenarios because the problem of over fitting. To overcome this problem, we first propose to use the variance of the diffusion probabilities as a measure of model complexity for the independent cascade model. After that, we propose two regularization schemes to reduce model complexity. The first scheme is based on regularizing the variance of the diffusion probabilities directly. The second scheme is based on regularizing the mean absolute deviation of the logarithm of the diffusion probabilities. We are able to derive an approximation solution for the first scheme and analytical solution to the second scheme. We conduct experiments by simulating information diffusion on six social network datasets. Experimental results show that the variance regularization scheme outperforms the baseline by a noticeable margin. The mean absolute deviation regularization scheme is better than the baseline.","PeriodicalId":143584,"journal":{"name":"2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Learning the information diffusion probabilities by using variance regularized EM algorithm\",\"authors\":\"Hai-Guang Li, Tianyu Cao, Zhao Li\",\"doi\":\"10.1109/ASONAM.2014.6921596\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we address the problem of learning the information diffusion probabilities when there is no sufficient data of information diffusion. By observing the information diffusion behavior on the popular social network web-site Twitter, we find that the evidence of information diffusion is extremely sparse. Less than one percent of tweets are retweeted, which is considered as the most important form of information diffusion evidence on Twitter. Previous research on predicting information diffusion probabilities has failed under such scenarios because the problem of over fitting. To overcome this problem, we first propose to use the variance of the diffusion probabilities as a measure of model complexity for the independent cascade model. After that, we propose two regularization schemes to reduce model complexity. The first scheme is based on regularizing the variance of the diffusion probabilities directly. The second scheme is based on regularizing the mean absolute deviation of the logarithm of the diffusion probabilities. We are able to derive an approximation solution for the first scheme and analytical solution to the second scheme. We conduct experiments by simulating information diffusion on six social network datasets. Experimental results show that the variance regularization scheme outperforms the baseline by a noticeable margin. The mean absolute deviation regularization scheme is better than the baseline.\",\"PeriodicalId\":143584,\"journal\":{\"name\":\"2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASONAM.2014.6921596\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM.2014.6921596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

本文研究了在信息扩散数据不足的情况下，如何学习信息扩散概率的问题。通过观察热门社交网站Twitter上的信息扩散行为，我们发现信息扩散的证据非常稀疏。不到1%的推文被转发，这被认为是推特上最重要的信息扩散证据形式。由于过度拟合的问题，以往的信息扩散概率预测研究在这种情况下失败了。为了克服这个问题，我们首先提出使用扩散概率的方差作为独立级联模型的模型复杂性的度量。然后，我们提出了两种正则化方案来降低模型的复杂度。第一种方案是直接对扩散概率的方差进行正则化。第二种方案是基于正则化扩散概率对数的平均绝对偏差。我们能够推导出第一种格式的近似解和第二种格式的解析解。我们在六个社交网络数据集上模拟信息扩散进行实验。实验结果表明，方差正则化方案的性能明显优于基线。平均绝对偏差正则化方案优于基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning the information diffusion probabilities by using variance regularized EM algorithm

In this paper we address the problem of learning the information diffusion probabilities when there is no sufficient data of information diffusion. By observing the information diffusion behavior on the popular social network web-site Twitter, we find that the evidence of information diffusion is extremely sparse. Less than one percent of tweets are retweeted, which is considered as the most important form of information diffusion evidence on Twitter. Previous research on predicting information diffusion probabilities has failed under such scenarios because the problem of over fitting. To overcome this problem, we first propose to use the variance of the diffusion probabilities as a measure of model complexity for the independent cascade model. After that, we propose two regularization schemes to reduce model complexity. The first scheme is based on regularizing the variance of the diffusion probabilities directly. The second scheme is based on regularizing the mean absolute deviation of the logarithm of the diffusion probabilities. We are able to derive an approximation solution for the first scheme and analytical solution to the second scheme. We conduct experiments by simulating information diffusion on six social network datasets. Experimental results show that the variance regularization scheme outperforms the baseline by a noticeable margin. The mean absolute deviation regularization scheme is better than the baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014)

自引率

0.00%

发文量