AndroMD: An Android malware detection framework based on source code analysis and permission scanning

IF 7.9 Q1 ENGINEERING, MULTIDISCIPLINARY

Results in Engineering Pub Date : 2025-09-10 DOI:10.1016/j.rineng.2025.107050

Arvind Prasad , Shalini Chandra , Wael Mohammad Alenazy , Gauhar Ali , Sajid Shah , Mohammed ElAffendi

{"title":"AndroMD: An Android malware detection framework based on source code analysis and permission scanning","authors":"Arvind Prasad , Shalini Chandra , Wael Mohammad Alenazy , Gauhar Ali , Sajid Shah , Mohammed ElAffendi","doi":"10.1016/j.rineng.2025.107050","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid growth of Android-based mobile and IoT applications has significantly increased the attack surface for malicious actors. These adversaries often exploit apps and social engineering to deliver malware that compromises device security and user privacy. To address this ongoing threat, we present AndroMD, an intelligent and scalable Android malware detection framework that combines automated dataset construction, optimal feature selection, and ensemble-based classification. The proposed framework is built on three core components. First, an automated pipeline processes over 600,000 APKs to extract static features from more than 140 million Java files and 600,000 manifest files, resulting in three distinct datasets: KeyCount, ZeroOne, and MNF. These datasets are constructed using keys and patterns derived from a detailed analysis of real decompiled malware code, ensuring semantic relevance. Second, we introduce the AndroMD Optimal Feature Selection (AOFS) method, which selects compact, high-performing feature subsets using iterative evaluation based on ensemble feedback. Third, an ensemble detection model combines Random Forest, Decision Tree, and Bagging classifiers, with a threshold-based aggregation mechanism that allows fine-grained control over detection sensitivity. Extensive evaluation demonstrates AndroMD's strong performance, achieving up to 99.88% accuracy on internal datasets and 91.66% accuracy in live testing, including detection of custom and zero-day malware samples. AndroMD also identifies threats overlooked by VirusTotal, showcasing its real-world applicability. The framework, along with sample datasets and code, is made publicly available to support reproducibility and further research on Android security.</div></div>","PeriodicalId":36919,"journal":{"name":"Results in Engineering","volume":"28 ","pages":"Article 107050"},"PeriodicalIF":7.9000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Results in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590123025031068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid growth of Android-based mobile and IoT applications has significantly increased the attack surface for malicious actors. These adversaries often exploit apps and social engineering to deliver malware that compromises device security and user privacy. To address this ongoing threat, we present AndroMD, an intelligent and scalable Android malware detection framework that combines automated dataset construction, optimal feature selection, and ensemble-based classification. The proposed framework is built on three core components. First, an automated pipeline processes over 600,000 APKs to extract static features from more than 140 million Java files and 600,000 manifest files, resulting in three distinct datasets: KeyCount, ZeroOne, and MNF. These datasets are constructed using keys and patterns derived from a detailed analysis of real decompiled malware code, ensuring semantic relevance. Second, we introduce the AndroMD Optimal Feature Selection (AOFS) method, which selects compact, high-performing feature subsets using iterative evaluation based on ensemble feedback. Third, an ensemble detection model combines Random Forest, Decision Tree, and Bagging classifiers, with a threshold-based aggregation mechanism that allows fine-grained control over detection sensitivity. Extensive evaluation demonstrates AndroMD's strong performance, achieving up to 99.88% accuracy on internal datasets and 91.66% accuracy in live testing, including detection of custom and zero-day malware samples. AndroMD also identifies threats overlooked by VirusTotal, showcasing its real-world applicability. The framework, along with sample datasets and code, is made publicly available to support reproducibility and further research on Android security.

查看原文本刊更多论文

AndroMD：基于源代码分析和权限扫描的Android恶意软件检测框架

基于android的移动和物联网应用的快速增长大大增加了恶意行为者的攻击面。这些攻击者经常利用应用程序和社交工程来发布危害设备安全和用户隐私的恶意软件。为了解决这种持续的威胁，我们提出了AndroMD，一个智能和可扩展的Android恶意软件检测框架，结合了自动数据集构建，最佳特征选择和基于集成的分类。提议的框架建立在三个核心组件上。首先，自动化管道处理超过600,000个apk，从超过1.4亿个Java文件和600,000个清单文件中提取静态特性，从而产生三个不同的数据集：KeyCount， ZeroOne和MNF。这些数据集使用从真实反编译恶意软件代码的详细分析中获得的键和模式构建，确保语义相关性。其次，我们引入了AndroMD最优特征选择（AOFS）方法，该方法使用基于集成反馈的迭代评估来选择紧凑、高性能的特征子集。第三，集成检测模型结合了随机森林、决策树和Bagging分类器，以及基于阈值的聚合机制，该机制允许对检测灵敏度进行细粒度控制。广泛的评估证明了AndroMD的强大性能，在内部数据集上达到99.88%的准确率，在实时测试中达到91.66%的准确率，包括检测自定义和零日恶意软件样本。AndroMD还可以识别VirusTotal忽略的威胁，展示其在现实世界中的适用性。该框架以及样本数据集和代码都是公开的，以支持可重复性和对Android安全性的进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊