Understanding Diffusion Processes: Inference and Theory

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI:10.1145/2835776.2855084

Xinran He

{"title":"Understanding Diffusion Processes: Inference and Theory","authors":"Xinran He","doi":"10.1145/2835776.2855084","DOIUrl":null,"url":null,"abstract":"With increasing popularity of social media and social networks sites, analyzing the social networks offers great potential to shed light on human social structure and provides great marketing opportunities. Usually, social network analysis starts with extracting or learning the social network and the associated parameters. Contrary to other analytical tasks, this step is highly non-trivial due to amorphous nature of social ties and the challenges of noisy and incomplete observations. My research focuses on improving accuracy in inferring the network as well as analyzing the consequences when the extracted network is noisy or erroneous. To be more precise, I propose to study the following two questions with a special focus on analyzing diffusion behaviors: (1) How to utilize special properties of social networks to improve accuracy of the extracted network under noisy and missing data; (2) How to characterize the impact of noise in the inferred network and carry out robust analysis and optimization. Usually the first step towards social influence analysis is to infer the diffusion network. Assuming a probabilistic model of influence and a model of how the timing of individuals’ adoption decisions correlates, one can use these data to estimate the strengths of influence between pairs of individuals. However, existing approaches for Network Inference rely on the common assumption that the observations used to train the models are complete, while missing observations are commonplace in practice due to time or technical limitations in data collection. Therefore, I propose to study the impact of incomplete observations and design efficient method to compensate for noise or incompleteness in observed data. I propose to exploit the fact that social networks have more specific structure than arbitrary graphs. A joint estimation of the graph generation model and the actual network structure is likely to significantly improve the estimation accuracy. Moreover, incorporating the content information of the cascade also has potential to improve the inference accuracy. Therefore, I propose to combine the Correlated Topic Model [1] and Hawkes Process [5, 4, 6] into a unified model to utilize content information [2]. Due to noise or missing data in the observations, even in the best case, one would expect that the inferred network structure and link strengths will only be an approximation to the truth; in other words, noise in the data will be pervasive for inferred social networks. I propose to focus on the algorithmic question of Influence Maximization [3] in the context of noisy social network data. More specifically, I propose to consider the following questions: Given an instance of an Influence Model, with level of mis-estimation: (1) Decide whether the objective function on this instance varies smoothly with perturbations to the parameters. (2) If the dependence is smooth, how to find a robustly nearoptimal solution.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"02 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2835776.2855084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With increasing popularity of social media and social networks sites, analyzing the social networks offers great potential to shed light on human social structure and provides great marketing opportunities. Usually, social network analysis starts with extracting or learning the social network and the associated parameters. Contrary to other analytical tasks, this step is highly non-trivial due to amorphous nature of social ties and the challenges of noisy and incomplete observations. My research focuses on improving accuracy in inferring the network as well as analyzing the consequences when the extracted network is noisy or erroneous. To be more precise, I propose to study the following two questions with a special focus on analyzing diffusion behaviors: (1) How to utilize special properties of social networks to improve accuracy of the extracted network under noisy and missing data; (2) How to characterize the impact of noise in the inferred network and carry out robust analysis and optimization. Usually the first step towards social influence analysis is to infer the diffusion network. Assuming a probabilistic model of influence and a model of how the timing of individuals’ adoption decisions correlates, one can use these data to estimate the strengths of influence between pairs of individuals. However, existing approaches for Network Inference rely on the common assumption that the observations used to train the models are complete, while missing observations are commonplace in practice due to time or technical limitations in data collection. Therefore, I propose to study the impact of incomplete observations and design efficient method to compensate for noise or incompleteness in observed data. I propose to exploit the fact that social networks have more specific structure than arbitrary graphs. A joint estimation of the graph generation model and the actual network structure is likely to significantly improve the estimation accuracy. Moreover, incorporating the content information of the cascade also has potential to improve the inference accuracy. Therefore, I propose to combine the Correlated Topic Model [1] and Hawkes Process [5, 4, 6] into a unified model to utilize content information [2]. Due to noise or missing data in the observations, even in the best case, one would expect that the inferred network structure and link strengths will only be an approximation to the truth; in other words, noise in the data will be pervasive for inferred social networks. I propose to focus on the algorithmic question of Influence Maximization [3] in the context of noisy social network data. More specifically, I propose to consider the following questions: Given an instance of an Influence Model, with level of mis-estimation: (1) Decide whether the objective function on this instance varies smoothly with perturbations to the parameters. (2) If the dependence is smooth, how to find a robustly nearoptimal solution.

查看原文本刊更多论文

理解扩散过程:推理和理论

随着社交媒体和社交网站的日益普及，分析社交网络为揭示人类社会结构提供了巨大的潜力，并提供了巨大的营销机会。通常，社会网络分析是从提取或学习社会网络及其相关参数开始的。与其他分析任务相反，由于社会关系的无定形性质以及嘈杂和不完整观察的挑战，这一步非常重要。我的研究重点是提高推理网络的准确性，以及分析当提取的网络有噪声或错误时的后果。更准确地说，我建议研究以下两个问题，重点分析扩散行为:(1)如何利用社会网络的特殊属性来提高在有噪声和缺失数据下提取网络的准确性;(2)如何表征噪声在推断网络中的影响，并进行鲁棒分析和优化。通常，社会影响分析的第一步是推断扩散网络。假设一个影响的概率模型和一个个人收养决定的时间如何相互关联的模型，人们可以使用这些数据来估计成对个人之间的影响强度。然而，现有的网络推理方法依赖于一个共同的假设，即用于训练模型的观测值是完整的，而由于数据收集的时间或技术限制，在实践中缺少观测值是很常见的。因此，我建议研究不完全观测的影响，设计有效的方法来补偿观测数据中的噪声或不完整。我建议利用社交网络具有比任意图表更具体的结构这一事实。图生成模型和实际网络结构的联合估计可能会显著提高估计精度。此外，结合级联的内容信息也有可能提高推理精度。因此，我提出将关联话题模型[1]和Hawkes过程[5,4,6]结合为一个统一的模型，利用内容信息[2]。由于观测中的噪声或缺失数据，即使在最好的情况下，人们也会期望推断的网络结构和链接强度只会近似于事实;换句话说，对于推断出的社交网络来说，数据中的噪音将无处不在。我建议在嘈杂的社交网络数据背景下关注影响最大化的算法问题[3]。更具体地说，我建议考虑以下问题:给定一个影响模型的实例，具有错误估计的程度:(1)决定该实例上的目标函数是否随着参数的扰动而平滑变化。(2)如果依赖关系是光滑的，如何找到一个鲁棒的近最优解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量