{"title":"Enhanced Malware Prediction and Containment Using Bayesian Neural Networks","authors":"Zahra Jamadi;Amir G. Aghdam","doi":"10.1109/JRFID.2024.3410881","DOIUrl":null,"url":null,"abstract":"In this paper, we present an integrated framework leveraging natural language processing (NLP) techniques and machine learning (ML) algorithms to detect malware at its early stage and predict its upcoming actions. We analyze application programming interface (API) call sequences in the same way as natural language inputs. Specifically, the proposed model employs Bi-LSTM neural networks and Bayesian neural networks (BNN) for this analysis. In the first part, a Bagging-XGBoost algorithm interprets consecutive API calls as 2-gram and 3-gram strings for early-stage malware detection and feature importance analysis. Additionally, a Bi-LSTM predicts the upcoming actions of an active malware by estimating the next API call in a sequence. Two separate Bayesian Bi-LSTMs are then developed in the second part to complement the above analysis. The first architecture is for early-stage malware detection, and the other is to predict the following action of active malware. The BNN not only predicts future malware actions but also assesses the uncertainty of each prediction. It enhances the process by providing the second and third most probable predictions, increasing system reliability and effectiveness. Our unified framework demonstrates efficiency in malware detection and action prediction, marking a significant advancement in countering malware threats. The Bayesian Bi-LSTM developed for predicting the next API call has an average accuracy of 89.53%. Additionally, the accuracy of the framework for malware detection at the early stage is 96.44%, demonstrating the superior performance of the proposed framework.","PeriodicalId":73291,"journal":{"name":"IEEE journal of radio frequency identification","volume":"8 ","pages":"592-600"},"PeriodicalIF":2.3000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal of radio frequency identification","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10550924/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we present an integrated framework leveraging natural language processing (NLP) techniques and machine learning (ML) algorithms to detect malware at its early stage and predict its upcoming actions. We analyze application programming interface (API) call sequences in the same way as natural language inputs. Specifically, the proposed model employs Bi-LSTM neural networks and Bayesian neural networks (BNN) for this analysis. In the first part, a Bagging-XGBoost algorithm interprets consecutive API calls as 2-gram and 3-gram strings for early-stage malware detection and feature importance analysis. Additionally, a Bi-LSTM predicts the upcoming actions of an active malware by estimating the next API call in a sequence. Two separate Bayesian Bi-LSTMs are then developed in the second part to complement the above analysis. The first architecture is for early-stage malware detection, and the other is to predict the following action of active malware. The BNN not only predicts future malware actions but also assesses the uncertainty of each prediction. It enhances the process by providing the second and third most probable predictions, increasing system reliability and effectiveness. Our unified framework demonstrates efficiency in malware detection and action prediction, marking a significant advancement in countering malware threats. The Bayesian Bi-LSTM developed for predicting the next API call has an average accuracy of 89.53%. Additionally, the accuracy of the framework for malware detection at the early stage is 96.44%, demonstrating the superior performance of the proposed framework.