Mansoor Ahmed, Abid Khan, Osama Saleem, Muhammad Haris
{"title":"一种恶意URL过滤的容错方法","authors":"Mansoor Ahmed, Abid Khan, Osama Saleem, Muhammad Haris","doi":"10.1109/ISNCC.2018.8530984","DOIUrl":null,"url":null,"abstract":"Existing URL filtering mechanisms lacks support for real-time fault tolerance and scalability. In this paper these issues are addressed by developing a scalable model which is real time and fault tolerant to classify streams of URL traffic. The key feature of our model is that it saves computation time, resources usage and bandwidth. This model is implemented in Apache Spark which runs APIs for machine learning and streaming. The dataset consists of 2.4 million URLs which were taken from both clean and malicious classes. In training set, clean URLs are labeled as 1 and malicious are labeled as 0. For this proposed model, distributed in-memory computation is provided by Apache Spark's resilient distributed datasets (RDD) in fault tolerant manner. By increasing number of nodes in the cluster we achieved linear scalability. Our model attained an accuracy of 96% on logistic regression classifier and scaled up well with the Apache Spark's cluster. In 55 second using logistic regression classifier from Spark ML1ib, 2 million URLs can be filtered. The model achieved fl-score values of 0.92, 0.95 and 0.93 along with precision and the results are evaluated using cross-validation schemes.","PeriodicalId":313846,"journal":{"name":"2018 International Symposium on Networks, Computers and Communications (ISNCC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Fault Tolerant Approach for Malicious URL Filtering\",\"authors\":\"Mansoor Ahmed, Abid Khan, Osama Saleem, Muhammad Haris\",\"doi\":\"10.1109/ISNCC.2018.8530984\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing URL filtering mechanisms lacks support for real-time fault tolerance and scalability. In this paper these issues are addressed by developing a scalable model which is real time and fault tolerant to classify streams of URL traffic. The key feature of our model is that it saves computation time, resources usage and bandwidth. This model is implemented in Apache Spark which runs APIs for machine learning and streaming. The dataset consists of 2.4 million URLs which were taken from both clean and malicious classes. In training set, clean URLs are labeled as 1 and malicious are labeled as 0. For this proposed model, distributed in-memory computation is provided by Apache Spark's resilient distributed datasets (RDD) in fault tolerant manner. By increasing number of nodes in the cluster we achieved linear scalability. Our model attained an accuracy of 96% on logistic regression classifier and scaled up well with the Apache Spark's cluster. In 55 second using logistic regression classifier from Spark ML1ib, 2 million URLs can be filtered. The model achieved fl-score values of 0.92, 0.95 and 0.93 along with precision and the results are evaluated using cross-validation schemes.\",\"PeriodicalId\":313846,\"journal\":{\"name\":\"2018 International Symposium on Networks, Computers and Communications (ISNCC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Symposium on Networks, Computers and Communications (ISNCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISNCC.2018.8530984\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Symposium on Networks, Computers and Communications (ISNCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISNCC.2018.8530984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Fault Tolerant Approach for Malicious URL Filtering
Existing URL filtering mechanisms lacks support for real-time fault tolerance and scalability. In this paper these issues are addressed by developing a scalable model which is real time and fault tolerant to classify streams of URL traffic. The key feature of our model is that it saves computation time, resources usage and bandwidth. This model is implemented in Apache Spark which runs APIs for machine learning and streaming. The dataset consists of 2.4 million URLs which were taken from both clean and malicious classes. In training set, clean URLs are labeled as 1 and malicious are labeled as 0. For this proposed model, distributed in-memory computation is provided by Apache Spark's resilient distributed datasets (RDD) in fault tolerant manner. By increasing number of nodes in the cluster we achieved linear scalability. Our model attained an accuracy of 96% on logistic regression classifier and scaled up well with the Apache Spark's cluster. In 55 second using logistic regression classifier from Spark ML1ib, 2 million URLs can be filtered. The model achieved fl-score values of 0.92, 0.95 and 0.93 along with precision and the results are evaluated using cross-validation schemes.