H. Shirazi, Shashika R. Muramudalige, I. Ray, A. Jayasumana
{"title":"Improved Phishing Detection Algorithms using Adversarial Autoencoder Synthesized Data","authors":"H. Shirazi, Shashika R. Muramudalige, I. Ray, A. Jayasumana","doi":"10.1109/LCN48667.2020.9314775","DOIUrl":null,"url":null,"abstract":"Malicious actors often use phishing attacks to compromise legitimate users’ credentials. Machine learning is a promising approach for phishing detection. While the accuracy of machine learning algorithms is often dependent on the training data, very little attack data for training is available. We propose an approach for augmenting existing datasets that can be used by machine learning algorithms. We use an Adversarial Autoencoder (AAE) to generate samples that mimic the phishing websites and provide metrics to assess the quality of the generated samples. We test these samples against models trained with real-world data. Some of generated samples are able to evade existing detection model. We then use a portion of these samples in training. The new machine learning models are more robust and have higher accuracy. In other words, real-world phishing site data augmented with AAE synthesized data used for training the model is more effective for phishing detection.","PeriodicalId":245782,"journal":{"name":"2020 IEEE 45th Conference on Local Computer Networks (LCN)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 45th Conference on Local Computer Networks (LCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LCN48667.2020.9314775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Malicious actors often use phishing attacks to compromise legitimate users’ credentials. Machine learning is a promising approach for phishing detection. While the accuracy of machine learning algorithms is often dependent on the training data, very little attack data for training is available. We propose an approach for augmenting existing datasets that can be used by machine learning algorithms. We use an Adversarial Autoencoder (AAE) to generate samples that mimic the phishing websites and provide metrics to assess the quality of the generated samples. We test these samples against models trained with real-world data. Some of generated samples are able to evade existing detection model. We then use a portion of these samples in training. The new machine learning models are more robust and have higher accuracy. In other words, real-world phishing site data augmented with AAE synthesized data used for training the model is more effective for phishing detection.