Poster: CUD: crowdsourcing for URL spam detection

Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security Pub Date : 2011-10-17 DOI:10.1145/2046707.2093493

Jun Hu, Hongyu Gao, Zhichun Li, Yan Chen

{"title":"Poster: CUD: crowdsourcing for URL spam detection","authors":"Jun Hu, Hongyu Gao, Zhichun Li, Yan Chen","doi":"10.1145/2046707.2093493","DOIUrl":null,"url":null,"abstract":"The prevalence of spam URLs in Internet services, such as email, social networks, blogs and online forums has become a serious problem. These spam URLs host spam advertisements, phishing attempts, and malwares, which are harmful for normal users. Existing URL blacklist approaches offer limited protection. Although recentmachine learning based URL classification approaches demonstrate good accuracy and reasonable throughput, they are based on observations fromexisting spamURLs and hard to detect new spam URLs when attackers employ new strategies. In this paper, we present CUD (Crowdsourcing for URL spam detection) as a supplement of existing detection tools. CUD leverages human intelligence for URL classification through crowdsourcing. CUD crawls existing user comments about spamURLs already on the Internet, and employs sentiment analysis from nature language processing to analyze the user comments automatically for detecting spam URLs. Since CUD does not using features directly associated with the URLs and their landing pages, it is more robust when attackers change their strategies. Through evaluation, we find up to 70% of URLs have user comments online. CUD achieves an accuracy of 86.8% in terms of true positive rate with a false positive rate 0.9%. Moreover, about 75% of spam URLs CUD detects are missed by other approaches. Therefore, CUD can be used as a good complement to other approaches.","PeriodicalId":72687,"journal":{"name":"Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security","volume":"86 1","pages":"785-788"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2046707.2093493","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The prevalence of spam URLs in Internet services, such as email, social networks, blogs and online forums has become a serious problem. These spam URLs host spam advertisements, phishing attempts, and malwares, which are harmful for normal users. Existing URL blacklist approaches offer limited protection. Although recentmachine learning based URL classification approaches demonstrate good accuracy and reasonable throughput, they are based on observations fromexisting spamURLs and hard to detect new spam URLs when attackers employ new strategies. In this paper, we present CUD (Crowdsourcing for URL spam detection) as a supplement of existing detection tools. CUD leverages human intelligence for URL classification through crowdsourcing. CUD crawls existing user comments about spamURLs already on the Internet, and employs sentiment analysis from nature language processing to analyze the user comments automatically for detecting spam URLs. Since CUD does not using features directly associated with the URLs and their landing pages, it is more robust when attackers change their strategies. Through evaluation, we find up to 70% of URLs have user comments online. CUD achieves an accuracy of 86.8% in terms of true positive rate with a false positive rate 0.9%. Moreover, about 75% of spam URLs CUD detects are missed by other approaches. Therefore, CUD can be used as a good complement to other approaches.

查看原文本刊更多论文

海报:CUD:众包URL垃圾邮件检测

在诸如电子邮件、社交网络、博客和在线论坛等互联网服务中，垃圾url的流行已经成为一个严重的问题。这些垃圾url包含垃圾广告、网络钓鱼企图和恶意软件，对普通用户是有害的。现有的URL黑名单方法提供有限的保护。尽管最近基于机器学习的URL分类方法显示出良好的准确性和合理的吞吐量，但它们是基于对现有垃圾URL的观察，当攻击者采用新的策略时，很难检测到新的垃圾URL。在本文中，我们提出了CUD (Crowdsourcing for URL spam detection)作为现有检测工具的补充。通过众包，CUD利用人类的智慧进行URL分类。CUD抓取互联网上已有的用户对垃圾url的评论，并采用自然语言处理中的情感分析对用户评论进行自动分析，从而检测出垃圾url。由于CUD不使用与url及其着陆页直接关联的特性，因此当攻击者改变策略时，它更加健壮。通过评估，我们发现高达70%的url有用户在线评论。CUD的真阳性率准确率为86.8%，假阳性率为0.9%。此外，CUD检测到的大约75%的垃圾url被其他方法遗漏了。因此，CUD可以作为其他方法的一个很好的补充。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Conference on Computer and Communications Security : proceedings of the ... conference on computer and communications security. ACM Conference on Computer and Communications Security

CiteScore

9.20

自引率

0.00%

发文量