Socialized Language Model Smoothing via Bi-directional Influence Propagation on Social Networks

Proceedings of the 25th International Conference on World Wide Web Pub Date : 2016-04-11 DOI:10.1145/2872427.2874811

Rui Yan, Cheng-te Li, Hsun-Ping Hsieh, P. Hu, Xiaohua Hu, Tingting He

{"title":"Socialized Language Model Smoothing via Bi-directional Influence Propagation on Social Networks","authors":"Rui Yan, Cheng-te Li, Hsun-Ping Hsieh, P. Hu, Xiaohua Hu, Tingting He","doi":"10.1145/2872427.2874811","DOIUrl":null,"url":null,"abstract":"In recent years, online social networks are among the most popular websites with high PV (Page View) all over the world, as they have renewed the way for information discovery and distribution. Millions of users have registered on these websites and hence generate formidable amount of user-generated contents every day. The social networks become \"giants\", likely eligible to carry on any research tasks. However, we have pointed out that these giants still suffer from their \"Achilles Heel\", i.e., extreme sparsity. Compared with the extremely large data over the whole collection, individual posting documents such as microblogs seem to be too sparse to make a difference under various research scenarios, while actually these postings are different. In this paper we propose to tackle the Achilles Heel of social networks by smoothing the language model via influence propagation. To further our previously proposed work to tackle the sparsity issue, we extend the socialized language model smoothing with bi-directional influence learned from propagation. Intuitively, it is insufficient not to distinguish the influence propagated between information source and target without directions. Hence, we formulate a bi-directional socialized factor graph model, which utilizes both the textual correlations between document pairs and the socialized augmentation networks behind the documents, such as user relationships and social interactions. These factors are modeled as attributes and dependencies among documents and their corresponding users, and then are distinguished on the direction level. We propose an effective learning algorithm to learn the proposed factor graph model with directions. Finally we propagate term counts to smooth documents based on the estimated influence. We run experiments on two instinctive datasets of Twitter and Weibo. The results validate the effectiveness of the proposed model. By incorporating direction information into the socialized language model smoothing, our approach obtains improvement over several alternative methods on both intrinsic and extrinsic evaluations measured in terms of perplexity, nDCG and MAP measurements.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2872427.2874811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In recent years, online social networks are among the most popular websites with high PV (Page View) all over the world, as they have renewed the way for information discovery and distribution. Millions of users have registered on these websites and hence generate formidable amount of user-generated contents every day. The social networks become "giants", likely eligible to carry on any research tasks. However, we have pointed out that these giants still suffer from their "Achilles Heel", i.e., extreme sparsity. Compared with the extremely large data over the whole collection, individual posting documents such as microblogs seem to be too sparse to make a difference under various research scenarios, while actually these postings are different. In this paper we propose to tackle the Achilles Heel of social networks by smoothing the language model via influence propagation. To further our previously proposed work to tackle the sparsity issue, we extend the socialized language model smoothing with bi-directional influence learned from propagation. Intuitively, it is insufficient not to distinguish the influence propagated between information source and target without directions. Hence, we formulate a bi-directional socialized factor graph model, which utilizes both the textual correlations between document pairs and the socialized augmentation networks behind the documents, such as user relationships and social interactions. These factors are modeled as attributes and dependencies among documents and their corresponding users, and then are distinguished on the direction level. We propose an effective learning algorithm to learn the proposed factor graph model with directions. Finally we propagate term counts to smooth documents based on the estimated influence. We run experiments on two instinctive datasets of Twitter and Weibo. The results validate the effectiveness of the proposed model. By incorporating direction information into the socialized language model smoothing, our approach obtains improvement over several alternative methods on both intrinsic and extrinsic evaluations measured in terms of perplexity, nDCG and MAP measurements.

查看原文本刊更多论文

基于社交网络双向影响传播的社会化语言模型平滑

近年来，在线社交网络是全球最受欢迎的高PV (Page View)网站之一，因为它更新了信息发现和传播的方式。数以百万计的用户在这些网站上注册，因此每天产生大量的用户生成的内容。社交网络成为“巨人”，可能有资格进行任何研究任务。然而，我们已经指出，这些巨人仍然遭受他们的“阿喀琉斯之踵”，即极端稀疏。与整个集合的海量数据相比，微博等个人发布文档在各种研究场景下显得过于稀疏，无法发挥作用，而实际上这些帖子是不同的。在本文中，我们提出通过影响传播平滑语言模型来解决社交网络的阿喀琉斯之踵。为了进一步解决稀疏性问题，我们利用从传播中学习到的双向影响扩展了社会化语言模型平滑。从直观上看，如果没有方向，不能区分信息源和目标之间传播的影响是不够的。因此，我们制定了一个双向社会化因素图模型，该模型既利用了文档对之间的文本相关性，也利用了文档背后的社会化增强网络，如用户关系和社会互动。将这些因素建模为文档及其相应用户之间的属性和依赖关系，然后在方向级别上进行区分。我们提出了一种有效的学习算法来学习所提出的带方向的因子图模型。最后，我们根据估计的影响将术语计数传播到平滑文档。我们在推特和微博两个本能数据集上进行实验。实验结果验证了该模型的有效性。通过将方向信息整合到社会化语言模型平滑中，我们的方法在基于困惑度、nDCG和MAP测量的内在和外在评估方面都优于几种替代方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th International Conference on World Wide Web

自引率

0.00%

发文量