基于特征和签名的自编码器恶意软件分类训练方法的评估

S. S. Tirumala, M. R. Valluri, David Nanadigam
{"title":"基于特征和签名的自编码器恶意软件分类训练方法的评估","authors":"S. S. Tirumala, M. R. Valluri, David Nanadigam","doi":"10.1109/COMSNETS48256.2020.9027373","DOIUrl":null,"url":null,"abstract":"Malware analysis has become a critical and notable area of research importance due to rapid growth in the development and application of internet based systems. Recent advances in artificial intelligence (AI) particularly with data mining enabled the implementation of AI based malware classification and detection systems. AI based malware analysis systems are predominantly signature based and are built on available malware datasets. This paper tries to evaluate the capability of a feature based malware classification using autoencoders. In so doing, this paper presents a new approach for creating a synthetic malware dataset based on signature and features which could be used to train and test both traditional and artificial intelligence based malware detection systems. Various experiments are carried out using autoencoders training on feature based and signature based datasets and tested on a synthetic dataset. The experiments also carried out with multiple datasets and topologies. The experiment results show that the feature based training is proved to be efficient for synthetic, signature and feature based datasets compared to signature based approach. Feature based stacked autoencoders (5-layered) is able to achieve a classification accuracy of 95.6% more than 11.6% when compared with the signature based system which could achieve only 84.6%.","PeriodicalId":265871,"journal":{"name":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Evaluation of Feature and Signature based Training Approaches for Malware Classification using Autoencoders\",\"authors\":\"S. S. Tirumala, M. R. Valluri, David Nanadigam\",\"doi\":\"10.1109/COMSNETS48256.2020.9027373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Malware analysis has become a critical and notable area of research importance due to rapid growth in the development and application of internet based systems. Recent advances in artificial intelligence (AI) particularly with data mining enabled the implementation of AI based malware classification and detection systems. AI based malware analysis systems are predominantly signature based and are built on available malware datasets. This paper tries to evaluate the capability of a feature based malware classification using autoencoders. In so doing, this paper presents a new approach for creating a synthetic malware dataset based on signature and features which could be used to train and test both traditional and artificial intelligence based malware detection systems. Various experiments are carried out using autoencoders training on feature based and signature based datasets and tested on a synthetic dataset. The experiments also carried out with multiple datasets and topologies. The experiment results show that the feature based training is proved to be efficient for synthetic, signature and feature based datasets compared to signature based approach. Feature based stacked autoencoders (5-layered) is able to achieve a classification accuracy of 95.6% more than 11.6% when compared with the signature based system which could achieve only 84.6%.\",\"PeriodicalId\":265871,\"journal\":{\"name\":\"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMSNETS48256.2020.9027373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS48256.2020.9027373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

由于基于互联网的系统的发展和应用的快速增长,恶意软件分析已经成为一个重要的和显著的研究领域。人工智能(AI)的最新进展,特别是数据挖掘,使基于AI的恶意软件分类和检测系统得以实现。基于人工智能的恶意软件分析系统主要是基于签名的,并且建立在可用的恶意软件数据集上。本文试图对基于特征的恶意软件自动编码器分类能力进行评估。因此,本文提出了一种基于签名和特征的合成恶意软件数据集的新方法,该数据集可用于训练和测试传统的和基于人工智能的恶意软件检测系统。在基于特征和基于签名的数据集上使用自动编码器进行了各种实验,并在合成数据集上进行了测试。实验还采用了多种数据集和拓扑结构。实验结果表明,与基于签名的训练方法相比,基于特征的训练方法对合成数据集、签名数据集和基于特征的数据集都是有效的。基于特征的堆叠自编码器(5层)能够达到95.6%的分类准确率,而基于签名的系统只能达到84.6%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of Feature and Signature based Training Approaches for Malware Classification using Autoencoders
Malware analysis has become a critical and notable area of research importance due to rapid growth in the development and application of internet based systems. Recent advances in artificial intelligence (AI) particularly with data mining enabled the implementation of AI based malware classification and detection systems. AI based malware analysis systems are predominantly signature based and are built on available malware datasets. This paper tries to evaluate the capability of a feature based malware classification using autoencoders. In so doing, this paper presents a new approach for creating a synthetic malware dataset based on signature and features which could be used to train and test both traditional and artificial intelligence based malware detection systems. Various experiments are carried out using autoencoders training on feature based and signature based datasets and tested on a synthetic dataset. The experiments also carried out with multiple datasets and topologies. The experiment results show that the feature based training is proved to be efficient for synthetic, signature and feature based datasets compared to signature based approach. Feature based stacked autoencoders (5-layered) is able to achieve a classification accuracy of 95.6% more than 11.6% when compared with the signature based system which could achieve only 84.6%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信