CURLA:基于云的垃圾邮件URL分析器,用于非常大的数据集

Shams Zawoad, Ragib Hasan, Munirul M. Haque, Gary Warner
{"title":"CURLA:基于云的垃圾邮件URL分析器,用于非常大的数据集","authors":"Shams Zawoad, Ragib Hasan, Munirul M. Haque, Gary Warner","doi":"10.1109/CLOUD.2014.102","DOIUrl":null,"url":null,"abstract":"URL blacklisting is a widely used technique for blocking phishing websites. To prepare an effective blacklist, it is necessary to analyze possible threats and include the identified malicious sites in the blacklist. Spam emails are good source for acquiring suspected phishing websites. However, the number of URLs gathered from spam emails is quite large. Fetching and analyzing the content of this large number of websites are very expensive tasks given limited computing and storage resources. Moreover, a high percentage of URLs extracted from spam emails refer to the same website. Hence, preserving the contents of all the websites causes significant storage waste. To solve the problem of massive computing and storage resource requirements, we propose and develop CURLA - a Cloud-based spam URL Analyzer, built on top of Amazon Elastic Computer Cloud (EC2) and Amazon Simple Queue Service (SQS). CURLA allows processing large number of spam-based URLs in parallel, which reduces the cost of establishing equally capable local infrastructure. Our system builds a database of unique spam-based URLs and accumulates the content of these unique websites in a central repository, which can be later used for phishing or other counterfeit websites detection. We show the effectiveness of our proposed architecture using real-life spam-based URL data.","PeriodicalId":288542,"journal":{"name":"2014 IEEE 7th International Conference on Cloud Computing","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"CURLA: Cloud-Based Spam URL Analyzer for Very Large Datasets\",\"authors\":\"Shams Zawoad, Ragib Hasan, Munirul M. Haque, Gary Warner\",\"doi\":\"10.1109/CLOUD.2014.102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"URL blacklisting is a widely used technique for blocking phishing websites. To prepare an effective blacklist, it is necessary to analyze possible threats and include the identified malicious sites in the blacklist. Spam emails are good source for acquiring suspected phishing websites. However, the number of URLs gathered from spam emails is quite large. Fetching and analyzing the content of this large number of websites are very expensive tasks given limited computing and storage resources. Moreover, a high percentage of URLs extracted from spam emails refer to the same website. Hence, preserving the contents of all the websites causes significant storage waste. To solve the problem of massive computing and storage resource requirements, we propose and develop CURLA - a Cloud-based spam URL Analyzer, built on top of Amazon Elastic Computer Cloud (EC2) and Amazon Simple Queue Service (SQS). CURLA allows processing large number of spam-based URLs in parallel, which reduces the cost of establishing equally capable local infrastructure. Our system builds a database of unique spam-based URLs and accumulates the content of these unique websites in a central repository, which can be later used for phishing or other counterfeit websites detection. We show the effectiveness of our proposed architecture using real-life spam-based URL data.\",\"PeriodicalId\":288542,\"journal\":{\"name\":\"2014 IEEE 7th International Conference on Cloud Computing\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 7th International Conference on Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLOUD.2014.102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 7th International Conference on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUD.2014.102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

网址黑名单是一种广泛使用的阻止网络钓鱼网站的技术。为了编制有效的黑名单,需要对可能存在的威胁进行分析,并将识别出的恶意站点加入黑名单。垃圾邮件是获取可疑网络钓鱼网站的良好来源。然而,从垃圾邮件中收集到的url数量相当大。在有限的计算和存储资源下,获取和分析这些大量网站的内容是非常昂贵的任务。此外,从垃圾邮件中提取的url中有很大一部分指向同一个网站。因此,保存所有网站的内容会造成很大的存储浪费。为了解决海量计算和存储资源需求的问题,我们在Amazon Elastic Computer Cloud (EC2)和Amazon Simple Queue Service (SQS)的基础上,提出并开发了基于云的垃圾邮件URL分析器CURLA。CURLA允许并行处理大量基于垃圾邮件的url,这降低了建立同等能力的本地基础设施的成本。我们的系统建立了一个独特的基于垃圾邮件的url数据库,并将这些独特网站的内容积累在一个中央存储库中,可以在以后用于网络钓鱼或其他假冒网站检测。我们使用真实的基于垃圾邮件的URL数据来展示我们提出的体系结构的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CURLA: Cloud-Based Spam URL Analyzer for Very Large Datasets
URL blacklisting is a widely used technique for blocking phishing websites. To prepare an effective blacklist, it is necessary to analyze possible threats and include the identified malicious sites in the blacklist. Spam emails are good source for acquiring suspected phishing websites. However, the number of URLs gathered from spam emails is quite large. Fetching and analyzing the content of this large number of websites are very expensive tasks given limited computing and storage resources. Moreover, a high percentage of URLs extracted from spam emails refer to the same website. Hence, preserving the contents of all the websites causes significant storage waste. To solve the problem of massive computing and storage resource requirements, we propose and develop CURLA - a Cloud-based spam URL Analyzer, built on top of Amazon Elastic Computer Cloud (EC2) and Amazon Simple Queue Service (SQS). CURLA allows processing large number of spam-based URLs in parallel, which reduces the cost of establishing equally capable local infrastructure. Our system builds a database of unique spam-based URLs and accumulates the content of these unique websites in a central repository, which can be later used for phishing or other counterfeit websites detection. We show the effectiveness of our proposed architecture using real-life spam-based URL data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信