具有真实特征的电子邮件网络建模的概率方法

Quangang Li, Jinqiao Shi, Tingwen Liu, Li Guo, Zhiguang Qin
{"title":"具有真实特征的电子邮件网络建模的概率方法","authors":"Quangang Li, Jinqiao Shi, Tingwen Liu, Li Guo, Zhiguang Qin","doi":"10.1109/ICCCN.2014.6911760","DOIUrl":null,"url":null,"abstract":"Email plays a very important role in our daily life. Much work have been put into practice on email network. Those studies mostly require real email network datasets and reliable models to analyze user information and understand the mechanisms of network evolution. However, much research work is constrained by the absence of real large-scale email datasets. Although email communication is ubiquitous, there are very few large-scale available email datasets satisfied different research purposes. Due to privacy policy and restricted permissions, it is arduous to collect a real large-scale email dataset in a short time. Various social network models are usually used to create synthetic email networks. However, these models focus on modeling several structural properties of network without considering user behaviour patterns. They are not appropriate to generate large-scale realistic synthetic email network datasets. Towards this end, we propose a probabilistic model by which we can construct large-scale synthetic email datasets with a small captured email log. What is more important is that the generated synthetic dataset matches real email network properties and individual communication patterns. Moreover, it has linear complexity, and can be paralleled easily. Experimental results on Enron dataset demonstrate the above benefits of our model.","PeriodicalId":404048,"journal":{"name":"2014 23rd International Conference on Computer Communication and Networks (ICCCN)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A probabilistic approach towards modeling email network with realistic features\",\"authors\":\"Quangang Li, Jinqiao Shi, Tingwen Liu, Li Guo, Zhiguang Qin\",\"doi\":\"10.1109/ICCCN.2014.6911760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Email plays a very important role in our daily life. Much work have been put into practice on email network. Those studies mostly require real email network datasets and reliable models to analyze user information and understand the mechanisms of network evolution. However, much research work is constrained by the absence of real large-scale email datasets. Although email communication is ubiquitous, there are very few large-scale available email datasets satisfied different research purposes. Due to privacy policy and restricted permissions, it is arduous to collect a real large-scale email dataset in a short time. Various social network models are usually used to create synthetic email networks. However, these models focus on modeling several structural properties of network without considering user behaviour patterns. They are not appropriate to generate large-scale realistic synthetic email network datasets. Towards this end, we propose a probabilistic model by which we can construct large-scale synthetic email datasets with a small captured email log. What is more important is that the generated synthetic dataset matches real email network properties and individual communication patterns. Moreover, it has linear complexity, and can be paralleled easily. Experimental results on Enron dataset demonstrate the above benefits of our model.\",\"PeriodicalId\":404048,\"journal\":{\"name\":\"2014 23rd International Conference on Computer Communication and Networks (ICCCN)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 23rd International Conference on Computer Communication and Networks (ICCCN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCN.2014.6911760\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 23rd International Conference on Computer Communication and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN.2014.6911760","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

电子邮件在我们的日常生活中扮演着非常重要的角色。在电子邮件网络方面已经进行了大量的工作。这些研究大多需要真实的电子邮件网络数据集和可靠的模型来分析用户信息,了解网络演化机制。然而,由于缺乏真正的大规模电子邮件数据集,许多研究工作受到限制。虽然电子邮件通信无处不在,但满足不同研究目的的大规模可用电子邮件数据集却很少。由于隐私政策和权限限制,在短时间内收集到真正的大规模电子邮件数据集是非常困难的。各种社交网络模型通常用于创建合成的电子邮件网络。然而,这些模型侧重于对网络的几个结构属性进行建模,而没有考虑用户的行为模式。它们不适合生成大规模的真实合成电子邮件网络数据集。为此,我们提出了一个概率模型,通过该模型,我们可以用一个小的捕获电子邮件日志构建大规模的合成电子邮件数据集。更重要的是,生成的合成数据集匹配真实的电子邮件网络属性和个人通信模式。此外,它具有线性复杂性,可以很容易地并联。在安然数据集上的实验结果证明了该模型的上述优点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A probabilistic approach towards modeling email network with realistic features
Email plays a very important role in our daily life. Much work have been put into practice on email network. Those studies mostly require real email network datasets and reliable models to analyze user information and understand the mechanisms of network evolution. However, much research work is constrained by the absence of real large-scale email datasets. Although email communication is ubiquitous, there are very few large-scale available email datasets satisfied different research purposes. Due to privacy policy and restricted permissions, it is arduous to collect a real large-scale email dataset in a short time. Various social network models are usually used to create synthetic email networks. However, these models focus on modeling several structural properties of network without considering user behaviour patterns. They are not appropriate to generate large-scale realistic synthetic email network datasets. Towards this end, we propose a probabilistic model by which we can construct large-scale synthetic email datasets with a small captured email log. What is more important is that the generated synthetic dataset matches real email network properties and individual communication patterns. Moreover, it has linear complexity, and can be paralleled easily. Experimental results on Enron dataset demonstrate the above benefits of our model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信