Generation of malicious webpage samples based on GAN

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) Pub Date : 2020-12-01 DOI:10.1109/TrustCom50675.2020.00116

Mengxiang Wan, Hanbing Yao, Xin Yan

{"title":"Generation of malicious webpage samples based on GAN","authors":"Mengxiang Wan, Hanbing Yao, Xin Yan","doi":"10.1109/TrustCom50675.2020.00116","DOIUrl":null,"url":null,"abstract":"Machine learning needs a large amount of labeled data to train classifiers. However, it's hard to collect malicious web samples because of the short survival times and changeable attack means. In this paper, we propose Web Feature Samples Generative Adversarial Network (WFS-GAN) to generate malicious webpage feature samples. In the proposed scheme, the 48 features are extracted from relatively small number of real malicious webpages, and then convert them into webpage feature vectors. The WFS-GAN is trained by these feature vectors to get a generator which can generate webpage feature samples. Then, the classifier is trained to identify malicious webpages by webpage feature samples generated by the WFS-GAN. The WFS-GAN is based on the CGAN, and the conditional information is webpage's class label. Especially, there are four discriminators in the WFS-GAN, one is global discriminator and the other three are feature discriminators. The global discriminator determines the authenticity of the whole samples to control the quality of the whole generated samples, while each feature discriminator determines the authenticity of the specific feature data of the samples to make the generated samples detailed. The experimental results show that the feature samples generated by the WFS-GAN can be used to train malicious webpage classifier, and the quality of the feature samples generated by the WFS-GAN is better than CGAN and CVAE.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom50675.2020.00116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Machine learning needs a large amount of labeled data to train classifiers. However, it's hard to collect malicious web samples because of the short survival times and changeable attack means. In this paper, we propose Web Feature Samples Generative Adversarial Network (WFS-GAN) to generate malicious webpage feature samples. In the proposed scheme, the 48 features are extracted from relatively small number of real malicious webpages, and then convert them into webpage feature vectors. The WFS-GAN is trained by these feature vectors to get a generator which can generate webpage feature samples. Then, the classifier is trained to identify malicious webpages by webpage feature samples generated by the WFS-GAN. The WFS-GAN is based on the CGAN, and the conditional information is webpage's class label. Especially, there are four discriminators in the WFS-GAN, one is global discriminator and the other three are feature discriminators. The global discriminator determines the authenticity of the whole samples to control the quality of the whole generated samples, while each feature discriminator determines the authenticity of the specific feature data of the samples to make the generated samples detailed. The experimental results show that the feature samples generated by the WFS-GAN can be used to train malicious webpage classifier, and the quality of the feature samples generated by the WFS-GAN is better than CGAN and CVAE.

查看原文本刊更多论文

基于GAN的恶意网页样本生成

机器学习需要大量的标记数据来训练分类器。然而，由于恶意web样本存活时间短，攻击手段多变，采集难度较大。在本文中，我们提出了Web特征样本生成对抗网络(WFS-GAN)来生成恶意网页特征样本。在该方案中，从相对较少的真实恶意网页中提取48个特征，然后将其转换为网页特征向量。利用这些特征向量对WFS-GAN进行训练，得到一个可以生成网页特征样本的生成器。然后，利用WFS-GAN生成的网页特征样本训练分类器识别恶意网页。WFS-GAN是基于CGAN的，条件信息是网页的类标签。特别地，在WFS-GAN中有四个鉴别器，一个是全局鉴别器，另外三个是特征鉴别器。全局判别器确定整个样本的真实性，以控制整个生成样本的质量，而每个特征判别器确定样本的特定特征数据的真实性，使生成的样本更加详细。实验结果表明，WFS-GAN生成的特征样本可用于训练恶意网页分类器，WFS-GAN生成的特征样本质量优于CGAN和CVAE。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

自引率

0.00%

发文量