Arvind Prasad , Shalini Chandra , Wael Mohammad Alenazy , Gauhar Ali , Sajid Shah , Mohammed ElAffendi
{"title":"AndroMD: An Android malware detection framework based on source code analysis and permission scanning","authors":"Arvind Prasad , Shalini Chandra , Wael Mohammad Alenazy , Gauhar Ali , Sajid Shah , Mohammed ElAffendi","doi":"10.1016/j.rineng.2025.107050","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid growth of Android-based mobile and IoT applications has significantly increased the attack surface for malicious actors. These adversaries often exploit apps and social engineering to deliver malware that compromises device security and user privacy. To address this ongoing threat, we present AndroMD, an intelligent and scalable Android malware detection framework that combines automated dataset construction, optimal feature selection, and ensemble-based classification. The proposed framework is built on three core components. First, an automated pipeline processes over 600,000 APKs to extract static features from more than 140 million Java files and 600,000 manifest files, resulting in three distinct datasets: KeyCount, ZeroOne, and MNF. These datasets are constructed using keys and patterns derived from a detailed analysis of real decompiled malware code, ensuring semantic relevance. Second, we introduce the AndroMD Optimal Feature Selection (AOFS) method, which selects compact, high-performing feature subsets using iterative evaluation based on ensemble feedback. Third, an ensemble detection model combines Random Forest, Decision Tree, and Bagging classifiers, with a threshold-based aggregation mechanism that allows fine-grained control over detection sensitivity. Extensive evaluation demonstrates AndroMD's strong performance, achieving up to 99.88% accuracy on internal datasets and 91.66% accuracy in live testing, including detection of custom and zero-day malware samples. AndroMD also identifies threats overlooked by VirusTotal, showcasing its real-world applicability. The framework, along with sample datasets and code, is made publicly available to support reproducibility and further research on Android security.</div></div>","PeriodicalId":36919,"journal":{"name":"Results in Engineering","volume":"28 ","pages":"Article 107050"},"PeriodicalIF":7.9000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Results in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590123025031068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid growth of Android-based mobile and IoT applications has significantly increased the attack surface for malicious actors. These adversaries often exploit apps and social engineering to deliver malware that compromises device security and user privacy. To address this ongoing threat, we present AndroMD, an intelligent and scalable Android malware detection framework that combines automated dataset construction, optimal feature selection, and ensemble-based classification. The proposed framework is built on three core components. First, an automated pipeline processes over 600,000 APKs to extract static features from more than 140 million Java files and 600,000 manifest files, resulting in three distinct datasets: KeyCount, ZeroOne, and MNF. These datasets are constructed using keys and patterns derived from a detailed analysis of real decompiled malware code, ensuring semantic relevance. Second, we introduce the AndroMD Optimal Feature Selection (AOFS) method, which selects compact, high-performing feature subsets using iterative evaluation based on ensemble feedback. Third, an ensemble detection model combines Random Forest, Decision Tree, and Bagging classifiers, with a threshold-based aggregation mechanism that allows fine-grained control over detection sensitivity. Extensive evaluation demonstrates AndroMD's strong performance, achieving up to 99.88% accuracy on internal datasets and 91.66% accuracy in live testing, including detection of custom and zero-day malware samples. AndroMD also identifies threats overlooked by VirusTotal, showcasing its real-world applicability. The framework, along with sample datasets and code, is made publicly available to support reproducibility and further research on Android security.