{"title":"基于特征和签名的自编码器恶意软件分类训练方法的评估","authors":"S. S. Tirumala, M. R. Valluri, David Nanadigam","doi":"10.1109/COMSNETS48256.2020.9027373","DOIUrl":null,"url":null,"abstract":"Malware analysis has become a critical and notable area of research importance due to rapid growth in the development and application of internet based systems. Recent advances in artificial intelligence (AI) particularly with data mining enabled the implementation of AI based malware classification and detection systems. AI based malware analysis systems are predominantly signature based and are built on available malware datasets. This paper tries to evaluate the capability of a feature based malware classification using autoencoders. In so doing, this paper presents a new approach for creating a synthetic malware dataset based on signature and features which could be used to train and test both traditional and artificial intelligence based malware detection systems. Various experiments are carried out using autoencoders training on feature based and signature based datasets and tested on a synthetic dataset. The experiments also carried out with multiple datasets and topologies. The experiment results show that the feature based training is proved to be efficient for synthetic, signature and feature based datasets compared to signature based approach. Feature based stacked autoencoders (5-layered) is able to achieve a classification accuracy of 95.6% more than 11.6% when compared with the signature based system which could achieve only 84.6%.","PeriodicalId":265871,"journal":{"name":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Evaluation of Feature and Signature based Training Approaches for Malware Classification using Autoencoders\",\"authors\":\"S. S. Tirumala, M. R. Valluri, David Nanadigam\",\"doi\":\"10.1109/COMSNETS48256.2020.9027373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Malware analysis has become a critical and notable area of research importance due to rapid growth in the development and application of internet based systems. Recent advances in artificial intelligence (AI) particularly with data mining enabled the implementation of AI based malware classification and detection systems. AI based malware analysis systems are predominantly signature based and are built on available malware datasets. This paper tries to evaluate the capability of a feature based malware classification using autoencoders. In so doing, this paper presents a new approach for creating a synthetic malware dataset based on signature and features which could be used to train and test both traditional and artificial intelligence based malware detection systems. Various experiments are carried out using autoencoders training on feature based and signature based datasets and tested on a synthetic dataset. The experiments also carried out with multiple datasets and topologies. The experiment results show that the feature based training is proved to be efficient for synthetic, signature and feature based datasets compared to signature based approach. Feature based stacked autoencoders (5-layered) is able to achieve a classification accuracy of 95.6% more than 11.6% when compared with the signature based system which could achieve only 84.6%.\",\"PeriodicalId\":265871,\"journal\":{\"name\":\"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMSNETS48256.2020.9027373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS48256.2020.9027373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluation of Feature and Signature based Training Approaches for Malware Classification using Autoencoders
Malware analysis has become a critical and notable area of research importance due to rapid growth in the development and application of internet based systems. Recent advances in artificial intelligence (AI) particularly with data mining enabled the implementation of AI based malware classification and detection systems. AI based malware analysis systems are predominantly signature based and are built on available malware datasets. This paper tries to evaluate the capability of a feature based malware classification using autoencoders. In so doing, this paper presents a new approach for creating a synthetic malware dataset based on signature and features which could be used to train and test both traditional and artificial intelligence based malware detection systems. Various experiments are carried out using autoencoders training on feature based and signature based datasets and tested on a synthetic dataset. The experiments also carried out with multiple datasets and topologies. The experiment results show that the feature based training is proved to be efficient for synthetic, signature and feature based datasets compared to signature based approach. Feature based stacked autoencoders (5-layered) is able to achieve a classification accuracy of 95.6% more than 11.6% when compared with the signature based system which could achieve only 84.6%.