Bin Liang, Miaoqiang Su, Wei You, Wenchang Shi, Gang Yang
{"title":"Cracking Classifiers for Evasion: A Case Study on the Google's Phishing Pages Filter","authors":"Bin Liang, Miaoqiang Su, Wei You, Wenchang Shi, Gang Yang","doi":"10.1145/2872427.2883060","DOIUrl":null,"url":null,"abstract":"Various classifiers based on the machine learning techniques have been widely used in security applications. Meanwhile, they also became an attack target of adversaries. Many existing studies have paid much attention to the evasion attacks on the online classifiers and discussed defensive methods. However, the security of the classifiers deployed in the client environment has not got the attention it deserves. Besides, earlier studies only concentrated on the experimental classifiers developed for research purposes only. The security of widely-used commercial classifiers still remains unclear. In this paper, we use the Google's phishing pages filter (GPPF), a classifier deployed in the Chrome browser which owns over one billion users, as a case to investigate the security challenges for the client-side classifiers. We present a new attack methodology targeting on client-side classifiers, called classifiers cracking. With the methodology, we successfully cracked the classification model of GPPF and extracted sufficient knowledge can be exploited for evasion attacks, including the classification algorithm, scoring rules and features, etc. Most importantly, we completely reverse engineered 84.8% scoring rules, covering most of high-weighted rules. Based on the cracked information, we performed two kinds of evasion attacks to GPPF, using 100 real phishing pages for the evaluation purpose. The experiments show that all the phishing pages (100%) can be easily manipulated to bypass the detection of GPPF. Our study demonstrates that the existing client-side classifiers are very vulnerable to classifiers cracking attacks.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2872427.2883060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 67
Abstract
Various classifiers based on the machine learning techniques have been widely used in security applications. Meanwhile, they also became an attack target of adversaries. Many existing studies have paid much attention to the evasion attacks on the online classifiers and discussed defensive methods. However, the security of the classifiers deployed in the client environment has not got the attention it deserves. Besides, earlier studies only concentrated on the experimental classifiers developed for research purposes only. The security of widely-used commercial classifiers still remains unclear. In this paper, we use the Google's phishing pages filter (GPPF), a classifier deployed in the Chrome browser which owns over one billion users, as a case to investigate the security challenges for the client-side classifiers. We present a new attack methodology targeting on client-side classifiers, called classifiers cracking. With the methodology, we successfully cracked the classification model of GPPF and extracted sufficient knowledge can be exploited for evasion attacks, including the classification algorithm, scoring rules and features, etc. Most importantly, we completely reverse engineered 84.8% scoring rules, covering most of high-weighted rules. Based on the cracked information, we performed two kinds of evasion attacks to GPPF, using 100 real phishing pages for the evaluation purpose. The experiments show that all the phishing pages (100%) can be easily manipulated to bypass the detection of GPPF. Our study demonstrates that the existing client-side classifiers are very vulnerable to classifiers cracking attacks.