Detecting Clusters of Fake Accounts in Online Social Networks

Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security Pub Date : 2015-10-16 DOI:10.1145/2808769.2808779

Cao Xiao, D. Freeman, Theodore Hwa

{"title":"Detecting Clusters of Fake Accounts in Online Social Networks","authors":"Cao Xiao, D. Freeman, Theodore Hwa","doi":"10.1145/2808769.2808779","DOIUrl":null,"url":null,"abstract":"Fake accounts are a preferred means for malicious users of online social networks to send spam, commit fraud, or otherwise abuse the system. A single malicious actor may create dozens to thousands of fake accounts in order to scale their operation to reach the maximum number of legitimate members. Detecting and taking action on these accounts as quickly as possible is imperative in order to protect legitimate members and maintain the trustworthiness of the network. However, any individual fake account may appear to be legitimate on first inspection, for example by having a real-sounding name or a believable profile. In this work we describe a scalable approach to finding groups of fake accounts registered by the same actor. The main technique is a supervised machine learning pipeline for classifying {\\em an entire cluster} of accounts as malicious or legitimate. The key features used in the model are statistics on fields of user-generated text such as name, email address, company or university; these include both frequencies of patterns {\\em within} the cluster (e.g., do all of the emails share a common letter/digit pattern) and comparison of text frequencies across the entire user base (e.g., are all of the names rare?). We apply our framework to analyze account data on LinkedIn grouped by registration IP address and registration date. Our model achieved AUC 0.98 on a held-out test set and AUC 0.95 on out-of-sample testing data. The model has been productionalized and has identified more than 250,000 fake accounts since deployment.","PeriodicalId":426614,"journal":{"name":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"169","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808769.2808779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 169

Abstract

Fake accounts are a preferred means for malicious users of online social networks to send spam, commit fraud, or otherwise abuse the system. A single malicious actor may create dozens to thousands of fake accounts in order to scale their operation to reach the maximum number of legitimate members. Detecting and taking action on these accounts as quickly as possible is imperative in order to protect legitimate members and maintain the trustworthiness of the network. However, any individual fake account may appear to be legitimate on first inspection, for example by having a real-sounding name or a believable profile. In this work we describe a scalable approach to finding groups of fake accounts registered by the same actor. The main technique is a supervised machine learning pipeline for classifying {\em an entire cluster} of accounts as malicious or legitimate. The key features used in the model are statistics on fields of user-generated text such as name, email address, company or university; these include both frequencies of patterns {\em within} the cluster (e.g., do all of the emails share a common letter/digit pattern) and comparison of text frequencies across the entire user base (e.g., are all of the names rare?). We apply our framework to analyze account data on LinkedIn grouped by registration IP address and registration date. Our model achieved AUC 0.98 on a held-out test set and AUC 0.95 on out-of-sample testing data. The model has been productionalized and has identified more than 250,000 fake accounts since deployment.

查看原文本刊更多论文

在线社交网络中虚假账户集群的检测

虚假账户是在线社交网络恶意用户发送垃圾邮件、实施欺诈或以其他方式滥用系统的首选手段。单个恶意行为者可能会创建数十到数千个虚假账户，以扩大其操作规模，达到合法成员的最大数量。为了保护合法会员和维护网络的可信度，必须尽快发现并对这些账户采取行动。然而，任何个人虚假账户在第一次检查时都可能看起来是合法的，例如，有一个听起来真实的名字或可信的个人资料。在这项工作中，我们描述了一种可扩展的方法来查找由同一参与者注册的虚假账户组。主要技术是一个有监督的机器学习管道，用于将整个集群的帐户分类为恶意或合法。模型中使用的关键特性是对用户生成文本字段(如姓名、电子邮件地址、公司或大学)的统计;这包括集群内模式的频率(例如，所有的电子邮件是否共享一个共同的字母/数字模式)和整个用户群的文本频率比较(例如，所有的名字都是罕见的吗?)我们应用我们的框架来分析LinkedIn上按注册IP地址和注册日期分组的帐户数据。我们的模型在hold out测试集上实现了AUC 0.98，在out-of-sample测试数据上实现了AUC 0.95。该模型已投入生产，自部署以来已识别出超过25万个虚假账户。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security

自引率

0.00%

发文量