{"title":"Understanding Diffusion Processes: Inference and Theory","authors":"Xinran He","doi":"10.1145/2835776.2855084","DOIUrl":null,"url":null,"abstract":"With increasing popularity of social media and social networks sites, analyzing the social networks offers great potential to shed light on human social structure and provides great marketing opportunities. Usually, social network analysis starts with extracting or learning the social network and the associated parameters. Contrary to other analytical tasks, this step is highly non-trivial due to amorphous nature of social ties and the challenges of noisy and incomplete observations. My research focuses on improving accuracy in inferring the network as well as analyzing the consequences when the extracted network is noisy or erroneous. To be more precise, I propose to study the following two questions with a special focus on analyzing diffusion behaviors: (1) How to utilize special properties of social networks to improve accuracy of the extracted network under noisy and missing data; (2) How to characterize the impact of noise in the inferred network and carry out robust analysis and optimization. Usually the first step towards social influence analysis is to infer the diffusion network. Assuming a probabilistic model of influence and a model of how the timing of individuals’ adoption decisions correlates, one can use these data to estimate the strengths of influence between pairs of individuals. However, existing approaches for Network Inference rely on the common assumption that the observations used to train the models are complete, while missing observations are commonplace in practice due to time or technical limitations in data collection. Therefore, I propose to study the impact of incomplete observations and design efficient method to compensate for noise or incompleteness in observed data. I propose to exploit the fact that social networks have more specific structure than arbitrary graphs. A joint estimation of the graph generation model and the actual network structure is likely to significantly improve the estimation accuracy. Moreover, incorporating the content information of the cascade also has potential to improve the inference accuracy. Therefore, I propose to combine the Correlated Topic Model [1] and Hawkes Process [5, 4, 6] into a unified model to utilize content information [2]. Due to noise or missing data in the observations, even in the best case, one would expect that the inferred network structure and link strengths will only be an approximation to the truth; in other words, noise in the data will be pervasive for inferred social networks. I propose to focus on the algorithmic question of Influence Maximization [3] in the context of noisy social network data. More specifically, I propose to consider the following questions: Given an instance of an Influence Model, with level of mis-estimation: (1) Decide whether the objective function on this instance varies smoothly with perturbations to the parameters. (2) If the dependence is smooth, how to find a robustly nearoptimal solution.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"02 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2835776.2855084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With increasing popularity of social media and social networks sites, analyzing the social networks offers great potential to shed light on human social structure and provides great marketing opportunities. Usually, social network analysis starts with extracting or learning the social network and the associated parameters. Contrary to other analytical tasks, this step is highly non-trivial due to amorphous nature of social ties and the challenges of noisy and incomplete observations. My research focuses on improving accuracy in inferring the network as well as analyzing the consequences when the extracted network is noisy or erroneous. To be more precise, I propose to study the following two questions with a special focus on analyzing diffusion behaviors: (1) How to utilize special properties of social networks to improve accuracy of the extracted network under noisy and missing data; (2) How to characterize the impact of noise in the inferred network and carry out robust analysis and optimization. Usually the first step towards social influence analysis is to infer the diffusion network. Assuming a probabilistic model of influence and a model of how the timing of individuals’ adoption decisions correlates, one can use these data to estimate the strengths of influence between pairs of individuals. However, existing approaches for Network Inference rely on the common assumption that the observations used to train the models are complete, while missing observations are commonplace in practice due to time or technical limitations in data collection. Therefore, I propose to study the impact of incomplete observations and design efficient method to compensate for noise or incompleteness in observed data. I propose to exploit the fact that social networks have more specific structure than arbitrary graphs. A joint estimation of the graph generation model and the actual network structure is likely to significantly improve the estimation accuracy. Moreover, incorporating the content information of the cascade also has potential to improve the inference accuracy. Therefore, I propose to combine the Correlated Topic Model [1] and Hawkes Process [5, 4, 6] into a unified model to utilize content information [2]. Due to noise or missing data in the observations, even in the best case, one would expect that the inferred network structure and link strengths will only be an approximation to the truth; in other words, noise in the data will be pervasive for inferred social networks. I propose to focus on the algorithmic question of Influence Maximization [3] in the context of noisy social network data. More specifically, I propose to consider the following questions: Given an instance of an Influence Model, with level of mis-estimation: (1) Decide whether the objective function on this instance varies smoothly with perturbations to the parameters. (2) If the dependence is smooth, how to find a robustly nearoptimal solution.