基于随机漫步的在线社交网络虚假账户检测

2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) Pub Date : 2017-06-01 DOI:10.1109/DSN.2017.55

Jinyuan Jia, Binghui Wang, N. Gong

{"title":"基于随机漫步的在线社交网络虚假账户检测","authors":"Jinyuan Jia, Binghui Wang, N. Gong","doi":"10.1109/DSN.2017.55","DOIUrl":null,"url":null,"abstract":"Online social networks are known to be vulnerable to the so-called Sybil attack, in which an attacker maintains massive fake accounts (also called Sybils) and uses them to perform various malicious activities. Therefore, Sybil detection is a fundamental security research problem in online social networks. Random walk based methods, which leverage the structure of an online social network to distribute reputation scores for users, have been demonstrated to be promising in certain real-world online social networks. In particular, random walk based methods have three desired features: they can have theoretically guaranteed performance for online social networks that have the fast-mixing property, they are accurate when the social network has strong homophily property, and they can be scalable to large-scale online social networks. However, existing random walk based methods suffer from several key limitations: 1) they can only leverage either labeled benign users or labeled Sybils, but not both, 2) they have limited detection accuracy for weak-homophily social networks, and 3) they are not robust to label noise in the training dataset. In this work, we propose a new random walk based Sybil detection method called SybilWalk. SybilWalk addresses the limitations of existing random walk based methods while maintaining their desired features. We perform both theoretical and empirical evaluations to compare SybilWalk with previous random walk based methods. Theoretically, for online social networks with the fast-mixing property, SybilWalk has a tighter asymptotical bound on the number of Sybils that are falsely accepted into the social network than all existing random walk based methods. Empirically, we compare SybilWalk with previous random walk based methods using both social networks with synthesized Sybils and a large-scale Twitter dataset with real Sybils. Our empirical results demonstrate that 1) SybilWalk is substantially more accurate than existing random walk based methods for weakhomophily social networks, 2) SybilWalk is substantially more robust to label noise than existing random walk based methods, and 3) SybilWalk is as scalable as the most efficient existing random walk based methods. In particular, on the Twitter dataset, SybilWalk achieves a false positive rate of 1.3% and a false negative rate of 17.3%.","PeriodicalId":426928,"journal":{"name":"2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"107","resultStr":"{\"title\":\"Random Walk Based Fake Account Detection in Online Social Networks\",\"authors\":\"Jinyuan Jia, Binghui Wang, N. Gong\",\"doi\":\"10.1109/DSN.2017.55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online social networks are known to be vulnerable to the so-called Sybil attack, in which an attacker maintains massive fake accounts (also called Sybils) and uses them to perform various malicious activities. Therefore, Sybil detection is a fundamental security research problem in online social networks. Random walk based methods, which leverage the structure of an online social network to distribute reputation scores for users, have been demonstrated to be promising in certain real-world online social networks. In particular, random walk based methods have three desired features: they can have theoretically guaranteed performance for online social networks that have the fast-mixing property, they are accurate when the social network has strong homophily property, and they can be scalable to large-scale online social networks. However, existing random walk based methods suffer from several key limitations: 1) they can only leverage either labeled benign users or labeled Sybils, but not both, 2) they have limited detection accuracy for weak-homophily social networks, and 3) they are not robust to label noise in the training dataset. In this work, we propose a new random walk based Sybil detection method called SybilWalk. SybilWalk addresses the limitations of existing random walk based methods while maintaining their desired features. We perform both theoretical and empirical evaluations to compare SybilWalk with previous random walk based methods. Theoretically, for online social networks with the fast-mixing property, SybilWalk has a tighter asymptotical bound on the number of Sybils that are falsely accepted into the social network than all existing random walk based methods. Empirically, we compare SybilWalk with previous random walk based methods using both social networks with synthesized Sybils and a large-scale Twitter dataset with real Sybils. Our empirical results demonstrate that 1) SybilWalk is substantially more accurate than existing random walk based methods for weakhomophily social networks, 2) SybilWalk is substantially more robust to label noise than existing random walk based methods, and 3) SybilWalk is as scalable as the most efficient existing random walk based methods. In particular, on the Twitter dataset, SybilWalk achieves a false positive rate of 1.3% and a false negative rate of 17.3%.\",\"PeriodicalId\":426928,\"journal\":{\"name\":\"2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"107\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSN.2017.55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN.2017.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 107

摘要

众所周知，在线社交网络很容易受到所谓的Sybil攻击，攻击者会维护大量虚假账户(也称为Sybil)，并利用这些账户进行各种恶意活动。因此，Sybil检测是在线社交网络安全研究的一个基本问题。基于随机漫步的方法利用在线社交网络的结构为用户分配声誉分数，已被证明在某些现实世界的在线社交网络中很有前途。特别是，基于随机漫步的方法具有三个理想的特征:对于具有快速混合特性的在线社交网络具有理论上保证的性能;对于具有强同质性的社交网络具有准确的性能;对于大规模的在线社交网络具有可扩展性。然而，现有的基于随机漫步的方法存在几个关键的局限性:1)它们只能利用标记的良性用户或标记的Sybils，但不能两者都利用;2)它们对弱同态社交网络的检测精度有限;3)它们对训练数据集中的标记噪声不具有鲁棒性。在这项工作中，我们提出了一种新的基于随机行走的Sybil检测方法，称为SybilWalk。SybilWalk解决了现有基于随机行走方法的局限性，同时保持了它们所需的功能。我们进行了理论和实证评估，将SybilWalk与以前基于随机行走的方法进行比较。理论上，对于具有快速混合特性的在线社交网络，SybilWalk比所有现有的基于随机行走的方法对被错误地接受到社交网络中的SybilWalk的数量有更严格的渐近界。在经验上，我们将SybilWalk与之前基于随机行走的方法进行了比较，使用了具有合成Sybils的社交网络和具有真实Sybils的大规模Twitter数据集。我们的实证结果表明:1)对于弱同态社交网络，SybilWalk比现有的基于随机行走的方法准确得多;2)SybilWalk对标记噪声的鲁棒性比现有的基于随机行走的方法强;3)SybilWalk与现有最有效的基于随机行走的方法一样具有可扩展性。特别是，在Twitter数据集上，SybilWalk的假阳性率为1.3%，假阴性率为17.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Random Walk Based Fake Account Detection in Online Social Networks

Online social networks are known to be vulnerable to the so-called Sybil attack, in which an attacker maintains massive fake accounts (also called Sybils) and uses them to perform various malicious activities. Therefore, Sybil detection is a fundamental security research problem in online social networks. Random walk based methods, which leverage the structure of an online social network to distribute reputation scores for users, have been demonstrated to be promising in certain real-world online social networks. In particular, random walk based methods have three desired features: they can have theoretically guaranteed performance for online social networks that have the fast-mixing property, they are accurate when the social network has strong homophily property, and they can be scalable to large-scale online social networks. However, existing random walk based methods suffer from several key limitations: 1) they can only leverage either labeled benign users or labeled Sybils, but not both, 2) they have limited detection accuracy for weak-homophily social networks, and 3) they are not robust to label noise in the training dataset. In this work, we propose a new random walk based Sybil detection method called SybilWalk. SybilWalk addresses the limitations of existing random walk based methods while maintaining their desired features. We perform both theoretical and empirical evaluations to compare SybilWalk with previous random walk based methods. Theoretically, for online social networks with the fast-mixing property, SybilWalk has a tighter asymptotical bound on the number of Sybils that are falsely accepted into the social network than all existing random walk based methods. Empirically, we compare SybilWalk with previous random walk based methods using both social networks with synthesized Sybils and a large-scale Twitter dataset with real Sybils. Our empirical results demonstrate that 1) SybilWalk is substantially more accurate than existing random walk based methods for weakhomophily social networks, 2) SybilWalk is substantially more robust to label noise than existing random walk based methods, and 3) SybilWalk is as scalable as the most efficient existing random walk based methods. In particular, on the Twitter dataset, SybilWalk achieves a false positive rate of 1.3% and a false negative rate of 17.3%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

自引率

0.00%

发文量