An Ensemble Machine Learning Approach for Detecting and Classifying Malware Attacks on Mobile Devices

IF 2.9 4区综合性期刊 Q2 MULTIDISCIPLINARY SCIENCES

Arabian Journal for Science and Engineering Pub Date : 2025-02-04 DOI:10.1007/s13369-025-10011-5

Eiman Alsharif, Maher Alharby

{"title":"An Ensemble Machine Learning Approach for Detecting and Classifying Malware Attacks on Mobile Devices","authors":"Eiman Alsharif, Maher Alharby","doi":"10.1007/s13369-025-10011-5","DOIUrl":null,"url":null,"abstract":"<div><p>The widespread use of mobile devices makes them targets for cybercriminals, especially with the rise of malware. Existing malware detection studies have limitations. These include focusing on subsets of datasets, using single classification approaches, and lacking usability in practical applications. This research develops a stacking ensemble method for detecting and classifying malware attacks on Android devices, employing supervised machine learning algorithms like Random Forest, Decision Tree, Gaussian Naive Bayes, K-Nearest Neighbors, and Logistic Regression. Using the CIC-AndMal2017 dataset, we apply data preprocessing techniques to address missing data and data imbalance. We employ various feature selection methods, including Random Forest Importance, Principal Component Analysis, and Correlation-Based Selection, to help reduce data dimensionality. We also utilize a grid search technique for hyperparameter tuning. We assess model performance using evaluation metrics, including accuracy, precision, recall, and F1 score. Additionally, we measure training and prediction times to ensure efficiency. The stacking technique achieved remarkable results, with 99.86% across all metrics (accuracy, precision, recall, and F1 score) for binary classification. For multi-class classification, the results were 97.0% accuracy, 97.03% precision, 97.07% recall, and 97.03% F1 score. Finally, we develop a user-friendly web application to enhance the accessibility and usability of the proposed models in detecting Android malware, ensuring broader adoption and practical application of the developed models.</p></div>","PeriodicalId":54354,"journal":{"name":"Arabian Journal for Science and Engineering","volume":"50 19","pages":"15825 - 15841"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arabian Journal for Science and Engineering","FirstCategoryId":"103","ListUrlMain":"https://link.springer.com/article/10.1007/s13369-025-10011-5","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

The widespread use of mobile devices makes them targets for cybercriminals, especially with the rise of malware. Existing malware detection studies have limitations. These include focusing on subsets of datasets, using single classification approaches, and lacking usability in practical applications. This research develops a stacking ensemble method for detecting and classifying malware attacks on Android devices, employing supervised machine learning algorithms like Random Forest, Decision Tree, Gaussian Naive Bayes, K-Nearest Neighbors, and Logistic Regression. Using the CIC-AndMal2017 dataset, we apply data preprocessing techniques to address missing data and data imbalance. We employ various feature selection methods, including Random Forest Importance, Principal Component Analysis, and Correlation-Based Selection, to help reduce data dimensionality. We also utilize a grid search technique for hyperparameter tuning. We assess model performance using evaluation metrics, including accuracy, precision, recall, and F1 score. Additionally, we measure training and prediction times to ensure efficiency. The stacking technique achieved remarkable results, with 99.86% across all metrics (accuracy, precision, recall, and F1 score) for binary classification. For multi-class classification, the results were 97.0% accuracy, 97.03% precision, 97.07% recall, and 97.03% F1 score. Finally, we develop a user-friendly web application to enhance the accessibility and usability of the proposed models in detecting Android malware, ensuring broader adoption and practical application of the developed models.

Abstract Image

查看原文本刊更多论文

基于集成机器学习的移动设备恶意软件攻击检测与分类方法

移动设备的广泛使用使其成为网络犯罪分子的目标，尤其是随着恶意软件的兴起。现有的恶意软件检测研究存在局限性。这些问题包括关注数据集的子集，使用单一的分类方法，以及在实际应用中缺乏可用性。本研究开发了一种堆栈集成方法，用于检测和分类Android设备上的恶意软件攻击，采用随机森林、决策树、高斯朴素贝叶斯、k近邻和逻辑回归等监督机器学习算法。利用CIC-AndMal2017数据集，采用数据预处理技术解决数据缺失和数据失衡问题。我们采用各种特征选择方法，包括随机森林重要性，主成分分析和基于相关性的选择，以帮助降低数据维数。我们还利用网格搜索技术进行超参数调优。我们使用评估指标来评估模型的性能，包括准确性、精密度、召回率和F1分数。此外，我们测量训练和预测时间以确保效率。堆叠技术取得了显著的效果，对于二分类，所有指标（准确度、精密度、召回率和F1分数）都达到了99.86%。对于多类分类，准确率为97.0%，准确率为97.03%，召回率为97.07%，F1评分为97.03%。最后，我们开发了一个用户友好的web应用程序，以提高所提出的模型在检测Android恶意软件方面的可访问性和可用性，确保所开发模型的广泛采用和实际应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Arabian Journal for Science and Engineering MULTIDISCIPLINARY SCIENCES-

CiteScore

5.70

自引率

3.40%

发文量

993

期刊介绍： King Fahd University of Petroleum & Minerals (KFUPM) partnered with Springer to publish the Arabian Journal for Science and Engineering (AJSE). AJSE, which has been published by KFUPM since 1975, is a recognized national, regional and international journal that provides a great opportunity for the dissemination of research advances from the Kingdom of Saudi Arabia, MENA and the world.