Quangang Li, Jinqiao Shi, Tingwen Liu, Li Guo, Zhiguang Qin
{"title":"具有真实特征的电子邮件网络建模的概率方法","authors":"Quangang Li, Jinqiao Shi, Tingwen Liu, Li Guo, Zhiguang Qin","doi":"10.1109/ICCCN.2014.6911760","DOIUrl":null,"url":null,"abstract":"Email plays a very important role in our daily life. Much work have been put into practice on email network. Those studies mostly require real email network datasets and reliable models to analyze user information and understand the mechanisms of network evolution. However, much research work is constrained by the absence of real large-scale email datasets. Although email communication is ubiquitous, there are very few large-scale available email datasets satisfied different research purposes. Due to privacy policy and restricted permissions, it is arduous to collect a real large-scale email dataset in a short time. Various social network models are usually used to create synthetic email networks. However, these models focus on modeling several structural properties of network without considering user behaviour patterns. They are not appropriate to generate large-scale realistic synthetic email network datasets. Towards this end, we propose a probabilistic model by which we can construct large-scale synthetic email datasets with a small captured email log. What is more important is that the generated synthetic dataset matches real email network properties and individual communication patterns. Moreover, it has linear complexity, and can be paralleled easily. Experimental results on Enron dataset demonstrate the above benefits of our model.","PeriodicalId":404048,"journal":{"name":"2014 23rd International Conference on Computer Communication and Networks (ICCCN)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A probabilistic approach towards modeling email network with realistic features\",\"authors\":\"Quangang Li, Jinqiao Shi, Tingwen Liu, Li Guo, Zhiguang Qin\",\"doi\":\"10.1109/ICCCN.2014.6911760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Email plays a very important role in our daily life. Much work have been put into practice on email network. Those studies mostly require real email network datasets and reliable models to analyze user information and understand the mechanisms of network evolution. However, much research work is constrained by the absence of real large-scale email datasets. Although email communication is ubiquitous, there are very few large-scale available email datasets satisfied different research purposes. Due to privacy policy and restricted permissions, it is arduous to collect a real large-scale email dataset in a short time. Various social network models are usually used to create synthetic email networks. However, these models focus on modeling several structural properties of network without considering user behaviour patterns. They are not appropriate to generate large-scale realistic synthetic email network datasets. Towards this end, we propose a probabilistic model by which we can construct large-scale synthetic email datasets with a small captured email log. What is more important is that the generated synthetic dataset matches real email network properties and individual communication patterns. Moreover, it has linear complexity, and can be paralleled easily. Experimental results on Enron dataset demonstrate the above benefits of our model.\",\"PeriodicalId\":404048,\"journal\":{\"name\":\"2014 23rd International Conference on Computer Communication and Networks (ICCCN)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 23rd International Conference on Computer Communication and Networks (ICCCN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCN.2014.6911760\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 23rd International Conference on Computer Communication and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN.2014.6911760","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A probabilistic approach towards modeling email network with realistic features
Email plays a very important role in our daily life. Much work have been put into practice on email network. Those studies mostly require real email network datasets and reliable models to analyze user information and understand the mechanisms of network evolution. However, much research work is constrained by the absence of real large-scale email datasets. Although email communication is ubiquitous, there are very few large-scale available email datasets satisfied different research purposes. Due to privacy policy and restricted permissions, it is arduous to collect a real large-scale email dataset in a short time. Various social network models are usually used to create synthetic email networks. However, these models focus on modeling several structural properties of network without considering user behaviour patterns. They are not appropriate to generate large-scale realistic synthetic email network datasets. Towards this end, we propose a probabilistic model by which we can construct large-scale synthetic email datasets with a small captured email log. What is more important is that the generated synthetic dataset matches real email network properties and individual communication patterns. Moreover, it has linear complexity, and can be paralleled easily. Experimental results on Enron dataset demonstrate the above benefits of our model.