一种恶意URL过滤的容错方法

Mansoor Ahmed, Abid Khan, Osama Saleem, Muhammad Haris
{"title":"一种恶意URL过滤的容错方法","authors":"Mansoor Ahmed, Abid Khan, Osama Saleem, Muhammad Haris","doi":"10.1109/ISNCC.2018.8530984","DOIUrl":null,"url":null,"abstract":"Existing URL filtering mechanisms lacks support for real-time fault tolerance and scalability. In this paper these issues are addressed by developing a scalable model which is real time and fault tolerant to classify streams of URL traffic. The key feature of our model is that it saves computation time, resources usage and bandwidth. This model is implemented in Apache Spark which runs APIs for machine learning and streaming. The dataset consists of 2.4 million URLs which were taken from both clean and malicious classes. In training set, clean URLs are labeled as 1 and malicious are labeled as 0. For this proposed model, distributed in-memory computation is provided by Apache Spark's resilient distributed datasets (RDD) in fault tolerant manner. By increasing number of nodes in the cluster we achieved linear scalability. Our model attained an accuracy of 96% on logistic regression classifier and scaled up well with the Apache Spark's cluster. In 55 second using logistic regression classifier from Spark ML1ib, 2 million URLs can be filtered. The model achieved fl-score values of 0.92, 0.95 and 0.93 along with precision and the results are evaluated using cross-validation schemes.","PeriodicalId":313846,"journal":{"name":"2018 International Symposium on Networks, Computers and Communications (ISNCC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Fault Tolerant Approach for Malicious URL Filtering\",\"authors\":\"Mansoor Ahmed, Abid Khan, Osama Saleem, Muhammad Haris\",\"doi\":\"10.1109/ISNCC.2018.8530984\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing URL filtering mechanisms lacks support for real-time fault tolerance and scalability. In this paper these issues are addressed by developing a scalable model which is real time and fault tolerant to classify streams of URL traffic. The key feature of our model is that it saves computation time, resources usage and bandwidth. This model is implemented in Apache Spark which runs APIs for machine learning and streaming. The dataset consists of 2.4 million URLs which were taken from both clean and malicious classes. In training set, clean URLs are labeled as 1 and malicious are labeled as 0. For this proposed model, distributed in-memory computation is provided by Apache Spark's resilient distributed datasets (RDD) in fault tolerant manner. By increasing number of nodes in the cluster we achieved linear scalability. Our model attained an accuracy of 96% on logistic regression classifier and scaled up well with the Apache Spark's cluster. In 55 second using logistic regression classifier from Spark ML1ib, 2 million URLs can be filtered. The model achieved fl-score values of 0.92, 0.95 and 0.93 along with precision and the results are evaluated using cross-validation schemes.\",\"PeriodicalId\":313846,\"journal\":{\"name\":\"2018 International Symposium on Networks, Computers and Communications (ISNCC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Symposium on Networks, Computers and Communications (ISNCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISNCC.2018.8530984\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Symposium on Networks, Computers and Communications (ISNCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISNCC.2018.8530984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

现有的URL过滤机制缺乏对实时容错和可伸缩性的支持。本文通过开发一种实时、容错的可扩展URL流分类模型来解决这些问题。该模型的主要特点是节省了计算时间、资源使用和带宽。该模型是在Apache Spark中实现的,它运行用于机器学习和流媒体的api。该数据集由240万个url组成,这些url来自干净类和恶意类。在训练集中,干净url被标记为1,恶意url被标记为0。在这个模型中,分布式内存计算由Apache Spark的弹性分布式数据集(RDD)以容错的方式提供。通过增加集群中的节点数量,我们实现了线性可扩展性。我们的模型在逻辑回归分类器上达到了96%的准确率,并且在Apache Spark的集群上进行了很好的扩展。使用Spark ML1ib的逻辑回归分类器,在55秒内可以过滤200万个url。模型的f -score值分别为0.92、0.95和0.93,精度较高,并采用交叉验证方案对结果进行评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Fault Tolerant Approach for Malicious URL Filtering
Existing URL filtering mechanisms lacks support for real-time fault tolerance and scalability. In this paper these issues are addressed by developing a scalable model which is real time and fault tolerant to classify streams of URL traffic. The key feature of our model is that it saves computation time, resources usage and bandwidth. This model is implemented in Apache Spark which runs APIs for machine learning and streaming. The dataset consists of 2.4 million URLs which were taken from both clean and malicious classes. In training set, clean URLs are labeled as 1 and malicious are labeled as 0. For this proposed model, distributed in-memory computation is provided by Apache Spark's resilient distributed datasets (RDD) in fault tolerant manner. By increasing number of nodes in the cluster we achieved linear scalability. Our model attained an accuracy of 96% on logistic regression classifier and scaled up well with the Apache Spark's cluster. In 55 second using logistic regression classifier from Spark ML1ib, 2 million URLs can be filtered. The model achieved fl-score values of 0.92, 0.95 and 0.93 along with precision and the results are evaluated using cross-validation schemes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信