Niju Shrestha, Rajan Kumar Kharel, Jason Britt, Ragib Hasan
{"title":"基于MapReduce的多模态网络钓鱼url的高性能分类","authors":"Niju Shrestha, Rajan Kumar Kharel, Jason Britt, Ragib Hasan","doi":"10.1109/SERVICES.2015.38","DOIUrl":null,"url":null,"abstract":"Classifying phishing websites can be expensive both computationally and financially given a large enough volume of suspect sites. A distributed cloud environment can reduce the computational time and financial cost significantly. To test this idea, we apply a multi-modal feature classification algorithm to classify phishing websites in a non-distributed and several distributed environments. A multi-modal approach combines both visual and text features for classification. The implementation extracts color feature and histogram feature from the screenshot of a phishing website and text from its html source code. Feature extraction and comparison is accomplished by applying the MapReduce framework. Implementing the multi-modal approach in a distributed environment proves to reduce the runtime as well as the financial costs. We present results that show our work is 30 times faster than existing state of the art systems in phishing website classification problem.","PeriodicalId":106002,"journal":{"name":"2015 IEEE World Congress on Services","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"High-Performance Classification of Phishing URLs Using a Multi-modal Approach with MapReduce\",\"authors\":\"Niju Shrestha, Rajan Kumar Kharel, Jason Britt, Ragib Hasan\",\"doi\":\"10.1109/SERVICES.2015.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classifying phishing websites can be expensive both computationally and financially given a large enough volume of suspect sites. A distributed cloud environment can reduce the computational time and financial cost significantly. To test this idea, we apply a multi-modal feature classification algorithm to classify phishing websites in a non-distributed and several distributed environments. A multi-modal approach combines both visual and text features for classification. The implementation extracts color feature and histogram feature from the screenshot of a phishing website and text from its html source code. Feature extraction and comparison is accomplished by applying the MapReduce framework. Implementing the multi-modal approach in a distributed environment proves to reduce the runtime as well as the financial costs. We present results that show our work is 30 times faster than existing state of the art systems in phishing website classification problem.\",\"PeriodicalId\":106002,\"journal\":{\"name\":\"2015 IEEE World Congress on Services\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE World Congress on Services\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SERVICES.2015.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE World Congress on Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERVICES.2015.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High-Performance Classification of Phishing URLs Using a Multi-modal Approach with MapReduce
Classifying phishing websites can be expensive both computationally and financially given a large enough volume of suspect sites. A distributed cloud environment can reduce the computational time and financial cost significantly. To test this idea, we apply a multi-modal feature classification algorithm to classify phishing websites in a non-distributed and several distributed environments. A multi-modal approach combines both visual and text features for classification. The implementation extracts color feature and histogram feature from the screenshot of a phishing website and text from its html source code. Feature extraction and comparison is accomplished by applying the MapReduce framework. Implementing the multi-modal approach in a distributed environment proves to reduce the runtime as well as the financial costs. We present results that show our work is 30 times faster than existing state of the art systems in phishing website classification problem.