Analyzing Machine Learning Approaches for Online Malware Detection in Cloud

2021 IEEE International Conference on Smart Computing (SMARTCOMP) Pub Date : 2021-05-19 DOI:10.1109/SMARTCOMP52413.2021.00046

Jeffrey Kimmell, Mahmoud Abdelsalam, Maanak Gupta

{"title":"Analyzing Machine Learning Approaches for Online Malware Detection in Cloud","authors":"Jeffrey Kimmell, Mahmoud Abdelsalam, Maanak Gupta","doi":"10.1109/SMARTCOMP52413.2021.00046","DOIUrl":null,"url":null,"abstract":"The variety of services and functionality offered by various cloud service providers (CSP) have exploded lately. Utilizing such services has created numerous opportunities for enterprises infrastructure to become cloud-based and, in turn, assisted the enterprises to easily and flexibly offer services to their customers. The practice of renting out access to servers to clients for computing and storage purposes is known as Infrastructure as a Service (IaaS). The popularity of IaaS has led to serious and critical concerns with respect to the cyber security and privacy. In particular, malware is often leveraged by malicious entities against cloud services to compromise sensitive data or to obstruct their functionality. In response to this growing menace, malware detection for cloud environments has become a widely researched topic with numerous methods being proposed and deployed. In this paper, we present online malware detection based on process level performance metrics, and analyze the effectiveness of different baseline machine learning models including, Support Vector Classifier (SVC), Random Forest Classifier (RFC), K-Nearest Neighbor (KNN), Gradient Boosted Classifier (GBC), Gaussian Naive Bayes (GNB) and Convolutional Neural Networks (CNN). Our analysis conclude that neural network models can most accurately detect the impact malware have on the process level features of virtual machines in the cloud, and therefore are best suited to detect them. Our models were trained, validated, and tested by using a dataset of 40,680 malicious and benign samples. The dataset was complied by running different families of malware (collected from VirusTotal) in a live cloud environment and collecting the process level features.","PeriodicalId":330785,"journal":{"name":"2021 IEEE International Conference on Smart Computing (SMARTCOMP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Smart Computing (SMARTCOMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMARTCOMP52413.2021.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

The variety of services and functionality offered by various cloud service providers (CSP) have exploded lately. Utilizing such services has created numerous opportunities for enterprises infrastructure to become cloud-based and, in turn, assisted the enterprises to easily and flexibly offer services to their customers. The practice of renting out access to servers to clients for computing and storage purposes is known as Infrastructure as a Service (IaaS). The popularity of IaaS has led to serious and critical concerns with respect to the cyber security and privacy. In particular, malware is often leveraged by malicious entities against cloud services to compromise sensitive data or to obstruct their functionality. In response to this growing menace, malware detection for cloud environments has become a widely researched topic with numerous methods being proposed and deployed. In this paper, we present online malware detection based on process level performance metrics, and analyze the effectiveness of different baseline machine learning models including, Support Vector Classifier (SVC), Random Forest Classifier (RFC), K-Nearest Neighbor (KNN), Gradient Boosted Classifier (GBC), Gaussian Naive Bayes (GNB) and Convolutional Neural Networks (CNN). Our analysis conclude that neural network models can most accurately detect the impact malware have on the process level features of virtual machines in the cloud, and therefore are best suited to detect them. Our models were trained, validated, and tested by using a dataset of 40,680 malicious and benign samples. The dataset was complied by running different families of malware (collected from VirusTotal) in a live cloud environment and collecting the process level features.

查看原文本刊更多论文

云环境下在线恶意软件检测的机器学习方法分析

各种云服务提供商(CSP)提供的各种服务和功能最近呈爆炸式增长。利用这些服务为企业基础设施成为基于云的基础设施创造了许多机会，反过来又帮助企业轻松灵活地向其客户提供服务。出于计算和存储目的将服务器访问权出租给客户的做法被称为基础设施即服务(IaaS)。IaaS的普及引起了人们对网络安全和隐私的严重关注。特别是，恶意实体经常利用恶意软件来攻击云服务，以破坏敏感数据或阻碍其功能。为了应对这种日益增长的威胁，针对云环境的恶意软件检测已经成为一个广泛研究的主题，有许多方法被提出和部署。在本文中，我们提出了基于过程级性能指标的在线恶意软件检测，并分析了不同基线机器学习模型的有效性，包括支持向量分类器(SVC)、随机森林分类器(RFC)、k -近邻(KNN)、梯度提升分类器(GBC)、高斯朴素贝叶斯(GNB)和卷积神经网络(CNN)。我们的分析得出结论，神经网络模型可以最准确地检测恶意软件对云中的虚拟机进程级功能的影响，因此最适合检测它们。我们的模型通过使用40,680个恶意和良性样本的数据集进行训练，验证和测试。数据集是通过在实时云环境中运行不同的恶意软件家族(从VirusTotal收集)并收集过程级特征来编译的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Smart Computing (SMARTCOMP)

自引率

0.00%

发文量