{"title":"Generation of malicious webpage samples based on GAN","authors":"Mengxiang Wan, Hanbing Yao, Xin Yan","doi":"10.1109/TrustCom50675.2020.00116","DOIUrl":null,"url":null,"abstract":"Machine learning needs a large amount of labeled data to train classifiers. However, it's hard to collect malicious web samples because of the short survival times and changeable attack means. In this paper, we propose Web Feature Samples Generative Adversarial Network (WFS-GAN) to generate malicious webpage feature samples. In the proposed scheme, the 48 features are extracted from relatively small number of real malicious webpages, and then convert them into webpage feature vectors. The WFS-GAN is trained by these feature vectors to get a generator which can generate webpage feature samples. Then, the classifier is trained to identify malicious webpages by webpage feature samples generated by the WFS-GAN. The WFS-GAN is based on the CGAN, and the conditional information is webpage's class label. Especially, there are four discriminators in the WFS-GAN, one is global discriminator and the other three are feature discriminators. The global discriminator determines the authenticity of the whole samples to control the quality of the whole generated samples, while each feature discriminator determines the authenticity of the specific feature data of the samples to make the generated samples detailed. The experimental results show that the feature samples generated by the WFS-GAN can be used to train malicious webpage classifier, and the quality of the feature samples generated by the WFS-GAN is better than CGAN and CVAE.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom50675.2020.00116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Machine learning needs a large amount of labeled data to train classifiers. However, it's hard to collect malicious web samples because of the short survival times and changeable attack means. In this paper, we propose Web Feature Samples Generative Adversarial Network (WFS-GAN) to generate malicious webpage feature samples. In the proposed scheme, the 48 features are extracted from relatively small number of real malicious webpages, and then convert them into webpage feature vectors. The WFS-GAN is trained by these feature vectors to get a generator which can generate webpage feature samples. Then, the classifier is trained to identify malicious webpages by webpage feature samples generated by the WFS-GAN. The WFS-GAN is based on the CGAN, and the conditional information is webpage's class label. Especially, there are four discriminators in the WFS-GAN, one is global discriminator and the other three are feature discriminators. The global discriminator determines the authenticity of the whole samples to control the quality of the whole generated samples, while each feature discriminator determines the authenticity of the specific feature data of the samples to make the generated samples detailed. The experimental results show that the feature samples generated by the WFS-GAN can be used to train malicious webpage classifier, and the quality of the feature samples generated by the WFS-GAN is better than CGAN and CVAE.