H. Shirazi, Shashika R. Muramudalige, I. Ray, A. Jayasumana
{"title":"基于对抗性自编码器合成数据的改进网络钓鱼检测算法","authors":"H. Shirazi, Shashika R. Muramudalige, I. Ray, A. Jayasumana","doi":"10.1109/LCN48667.2020.9314775","DOIUrl":null,"url":null,"abstract":"Malicious actors often use phishing attacks to compromise legitimate users’ credentials. Machine learning is a promising approach for phishing detection. While the accuracy of machine learning algorithms is often dependent on the training data, very little attack data for training is available. We propose an approach for augmenting existing datasets that can be used by machine learning algorithms. We use an Adversarial Autoencoder (AAE) to generate samples that mimic the phishing websites and provide metrics to assess the quality of the generated samples. We test these samples against models trained with real-world data. Some of generated samples are able to evade existing detection model. We then use a portion of these samples in training. The new machine learning models are more robust and have higher accuracy. In other words, real-world phishing site data augmented with AAE synthesized data used for training the model is more effective for phishing detection.","PeriodicalId":245782,"journal":{"name":"2020 IEEE 45th Conference on Local Computer Networks (LCN)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Improved Phishing Detection Algorithms using Adversarial Autoencoder Synthesized Data\",\"authors\":\"H. Shirazi, Shashika R. Muramudalige, I. Ray, A. Jayasumana\",\"doi\":\"10.1109/LCN48667.2020.9314775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Malicious actors often use phishing attacks to compromise legitimate users’ credentials. Machine learning is a promising approach for phishing detection. While the accuracy of machine learning algorithms is often dependent on the training data, very little attack data for training is available. We propose an approach for augmenting existing datasets that can be used by machine learning algorithms. We use an Adversarial Autoencoder (AAE) to generate samples that mimic the phishing websites and provide metrics to assess the quality of the generated samples. We test these samples against models trained with real-world data. Some of generated samples are able to evade existing detection model. We then use a portion of these samples in training. The new machine learning models are more robust and have higher accuracy. In other words, real-world phishing site data augmented with AAE synthesized data used for training the model is more effective for phishing detection.\",\"PeriodicalId\":245782,\"journal\":{\"name\":\"2020 IEEE 45th Conference on Local Computer Networks (LCN)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 45th Conference on Local Computer Networks (LCN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/LCN48667.2020.9314775\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 45th Conference on Local Computer Networks (LCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LCN48667.2020.9314775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved Phishing Detection Algorithms using Adversarial Autoencoder Synthesized Data
Malicious actors often use phishing attacks to compromise legitimate users’ credentials. Machine learning is a promising approach for phishing detection. While the accuracy of machine learning algorithms is often dependent on the training data, very little attack data for training is available. We propose an approach for augmenting existing datasets that can be used by machine learning algorithms. We use an Adversarial Autoencoder (AAE) to generate samples that mimic the phishing websites and provide metrics to assess the quality of the generated samples. We test these samples against models trained with real-world data. Some of generated samples are able to evade existing detection model. We then use a portion of these samples in training. The new machine learning models are more robust and have higher accuracy. In other words, real-world phishing site data augmented with AAE synthesized data used for training the model is more effective for phishing detection.