Effective and Efficient Android Malware Detection and Category Classification Using the Enhanced KronoDroid Dataset

4区 计算机科学 Q3 Computer Science
Mudassar Waheed, Sana Qadir
{"title":"Effective and Efficient Android Malware Detection and Category Classification Using the Enhanced KronoDroid Dataset","authors":"Mudassar Waheed, Sana Qadir","doi":"10.1155/2024/7382302","DOIUrl":null,"url":null,"abstract":"Android is the most widely used mobile operating system and responsible for handling a wide variety of data from simple messages to sensitive banking details. The explosive increase in malware targeting this platform has made it imperative to adopt machine learning approaches for effective malware detection and classification. Since its release in 2008, the Android platform has changed substantially and there has also been a significant increase in the number, complexity, and evolution of malware that target this platform. This rapid evolution quickly renders existing malware datasets out of date and has a degrading impact on machine learning-based detection models. Many studies have been carried out to explore the effectiveness of various machine learning models for Android malware detection. Majority of these studies use datasets that have compiled using static or dynamic analysis of malware but the use of hybrid analysis approaches has not been addressed completely. Likewise, the impact of malware evolution has not been fully investigated. Although some of the models have achieved exceptional results, their performance deteriorated for evolving malware and they were also not effective against antidynamic malware. In this paper, we address both these limitations by creating an enhanced subset of the KronoDroid dataset and using it to develop a supervised machine learning model capable of detecting evolving and antidynamic malware. The original KronoDroid dataset contains malware samples from 2008 to 2020, making it effective for the detection of evolving malware and handling concept drift. Also, the dynamic features are collected by executing the malware on a real device, making it effective for handling antidynamic malware. We create an enhanced subset of this dataset by adding malware category labels with the help of multiple online repositories. Then, we train multiple supervised machine learning models and use the ExtraTree classifier to select the top 50 features. Our results show that the random forest (RF) model has the highest accuracy of 98.03% for malware detection and 87.56% for malware category classification (for 15 malware categories).","PeriodicalId":49554,"journal":{"name":"Security and Communication Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Security and Communication Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1155/2024/7382302","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

Android is the most widely used mobile operating system and responsible for handling a wide variety of data from simple messages to sensitive banking details. The explosive increase in malware targeting this platform has made it imperative to adopt machine learning approaches for effective malware detection and classification. Since its release in 2008, the Android platform has changed substantially and there has also been a significant increase in the number, complexity, and evolution of malware that target this platform. This rapid evolution quickly renders existing malware datasets out of date and has a degrading impact on machine learning-based detection models. Many studies have been carried out to explore the effectiveness of various machine learning models for Android malware detection. Majority of these studies use datasets that have compiled using static or dynamic analysis of malware but the use of hybrid analysis approaches has not been addressed completely. Likewise, the impact of malware evolution has not been fully investigated. Although some of the models have achieved exceptional results, their performance deteriorated for evolving malware and they were also not effective against antidynamic malware. In this paper, we address both these limitations by creating an enhanced subset of the KronoDroid dataset and using it to develop a supervised machine learning model capable of detecting evolving and antidynamic malware. The original KronoDroid dataset contains malware samples from 2008 to 2020, making it effective for the detection of evolving malware and handling concept drift. Also, the dynamic features are collected by executing the malware on a real device, making it effective for handling antidynamic malware. We create an enhanced subset of this dataset by adding malware category labels with the help of multiple online repositories. Then, we train multiple supervised machine learning models and use the ExtraTree classifier to select the top 50 features. Our results show that the random forest (RF) model has the highest accuracy of 98.03% for malware detection and 87.56% for malware category classification (for 15 malware categories).
使用增强型 KronoDroid 数据集有效、高效地进行安卓恶意软件检测和类别分类
安卓是使用最广泛的移动操作系统,负责处理从简单信息到敏感银行信息等各种数据。针对这一平台的恶意软件呈爆炸式增长,因此采用机器学习方法进行有效的恶意软件检测和分类势在必行。自 2008 年发布以来,安卓平台发生了巨大变化,针对该平台的恶意软件的数量、复杂性和演化程度也显著增加。这种快速演变使现有的恶意软件数据集迅速过时,并对基于机器学习的检测模型产生了负面影响。为了探索各种机器学习模型在安卓恶意软件检测中的有效性,已经开展了许多研究。这些研究大多使用对恶意软件进行静态或动态分析后编制的数据集,但混合分析方法的使用尚未完全解决。同样,恶意软件进化的影响也没有得到充分研究。虽然有些模型取得了优异的成绩,但它们在处理不断进化的恶意软件时性能下降,而且对反动态恶意软件也无效。在本文中,我们通过创建 KronoDroid 数据集的增强子集,并利用该子集开发能够检测进化型和反动态型恶意软件的监督机器学习模型,解决了这两个局限性。原始 KronoDroid 数据集包含 2008 年至 2020 年的恶意软件样本,因此能有效检测不断演变的恶意软件并处理概念漂移。此外,动态特征是通过在真实设备上执行恶意软件收集的,因此能有效处理反动态恶意软件。我们借助多个在线资料库添加恶意软件类别标签,创建了该数据集的增强子集。然后,我们训练多个有监督的机器学习模型,并使用 ExtraTree 分类器选择前 50 个特征。结果表明,随机森林(RF)模型的恶意软件检测准确率最高,达到 98.03%,恶意软件类别分类准确率最高,达到 87.56%(15 个恶意软件类别)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Security and Communication Networks
Security and Communication Networks COMPUTER SCIENCE, INFORMATION SYSTEMS-TELECOMMUNICATIONS
自引率
0.00%
发文量
1274
审稿时长
11.3 months
期刊介绍: Security and Communication Networks is an international journal publishing original research and review papers on all security areas including network security, cryptography, cyber security, etc. The emphasis is on security protocols, approaches and techniques applied to all types of information and communication networks, including wired, wireless and optical transmission platforms. The journal provides a prestigious forum for the R&D community in academia and industry working at the inter-disciplinary nexus of next generation communications technologies for security implementations in all network layers. Answering the highly practical and commercial importance of network security R&D, submissions of applications-oriented papers describing case studies and simulations are encouraged as well as research analysis-type papers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信