{"title":"Data Augmentation with Generative Models for Improved Malware Detection: A Comparative Study*","authors":"R. Burks, K. Islam, Yan Lu, Jiang Li","doi":"10.1109/UEMCON47517.2019.8993085","DOIUrl":null,"url":null,"abstract":"Generative Models have been very accommodating when it comes to generating artificial data. Two of the most popular and promising models are the Generative Adversarial Network (GAN) and Variational Autoencoder (VAE) models. They both play critical roles in classification problems by generating synthetic data to train classifier more accurately. Malware detection is the process of determining whether or not software is malicious on the host's system and diagnosing what type of attack it is. Without adequate amount of training data, it makes malware detection less efficient. In this paper, we compare the two generative models to generate synthetic training data to boost the Residual Network (ResNet-18) classifier for malware detection. Experiment results show that adding synthetic malware samples generated by VAE to the training data improved the accuracy of ResNet-18 by 2% as it compared to 6% by GAN.","PeriodicalId":187022,"journal":{"name":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"324 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON47517.2019.8993085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
Generative Models have been very accommodating when it comes to generating artificial data. Two of the most popular and promising models are the Generative Adversarial Network (GAN) and Variational Autoencoder (VAE) models. They both play critical roles in classification problems by generating synthetic data to train classifier more accurately. Malware detection is the process of determining whether or not software is malicious on the host's system and diagnosing what type of attack it is. Without adequate amount of training data, it makes malware detection less efficient. In this paper, we compare the two generative models to generate synthetic training data to boost the Residual Network (ResNet-18) classifier for malware detection. Experiment results show that adding synthetic malware samples generated by VAE to the training data improved the accuracy of ResNet-18 by 2% as it compared to 6% by GAN.