{"title":"Adaptive Android APKs Reverse Engineering for Features Processing in Machine Learning Malware Detection","authors":"B. A. Gyunka, Aro Taye Oladele, Ojeniyi Adegoke","doi":"10.18517/ijods.4.1.10-25.2023","DOIUrl":null,"url":null,"abstract":"The key component that makes the detection of android malware possible is the availability of the right triggers and pointers, which are found in the Android application packages, known as features or attributes. These are fundamental in the training of the different machine learning algorithms to produce the required detection model. The process of extracting these attributes or features, from the Android application packages, is known as reverse engineering. This paper delved into the experimental detail processes of applying reverse engineering procedure, using Sublime Text 2 and Androguard Plugin, on Android Application packages for the extraction of, particularly permissions, which are the targeted features. The study further discussed the cleaning stages, using NotePad++, Microsoft Excel Worksheet, and MS Word, to sort out all the relevant and important features by removing all the noisy ones. A total of 1500 Android apps were downloaded from both benign and malicious sources and used for the experiment. The cleaned or important features extracted from these application packages at the end of the reverse engineering processes are 162 in total and these were further used to form a feature binary matrix of size 1500 by 163 (including the class features).","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"12 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2023-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Science and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18517/ijods.4.1.10-25.2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The key component that makes the detection of android malware possible is the availability of the right triggers and pointers, which are found in the Android application packages, known as features or attributes. These are fundamental in the training of the different machine learning algorithms to produce the required detection model. The process of extracting these attributes or features, from the Android application packages, is known as reverse engineering. This paper delved into the experimental detail processes of applying reverse engineering procedure, using Sublime Text 2 and Androguard Plugin, on Android Application packages for the extraction of, particularly permissions, which are the targeted features. The study further discussed the cleaning stages, using NotePad++, Microsoft Excel Worksheet, and MS Word, to sort out all the relevant and important features by removing all the noisy ones. A total of 1500 Android apps were downloaded from both benign and malicious sources and used for the experiment. The cleaned or important features extracted from these application packages at the end of the reverse engineering processes are 162 in total and these were further used to form a feature binary matrix of size 1500 by 163 (including the class features).
使检测android恶意软件成为可能的关键组件是正确的触发器和指针的可用性,它们在android应用程序包中被称为功能或属性。这些是训练不同机器学习算法以产生所需检测模型的基础。从Android应用程序包中提取这些属性或特征的过程被称为逆向工程。本文深入研究了应用逆向工程程序的实验细节过程,使用Sublime Text 2和Androguard Plugin,在Android应用程序包上进行提取,特别是权限的提取,这是目标特性。该研究进一步讨论了清理阶段,使用notepad++, Microsoft Excel工作表和MS Word,通过删除所有嘈杂的功能来整理所有相关和重要的功能。总共有1500个安卓应用程序从良性和恶意来源下载并用于实验。在逆向工程过程结束时,从这些应用程序包中提取的清理或重要特征总共为162个,这些特征进一步用于形成大小为1500 × 163的特征二进制矩阵(包括类特征)。
期刊介绍:
Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation.The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations. The journal is composed of three streams: Regular, to communicate original and reproducible theoretical and experimental findings on data science and analytics; Applications, to report the significant data science applications to real-life situations; and Trends, to report expert opinion and comprehensive surveys and reviews of relevant areas and topics in data science and analytics.Topics of relevance include all aspects of the trends, scientific foundations, techniques, and applications of data science and analytics, with a primary focus on:statistical and mathematical foundations for data science and analytics;understanding and analytics of complex data, human, domain, network, organizational, social, behavior, and system characteristics, complexities and intelligences;creation and extraction, processing, representation and modelling, learning and discovery, fusion and integration, presentation and visualization of complex data, behavior, knowledge and intelligence;data analytics, pattern recognition, knowledge discovery, machine learning, deep analytics and deep learning, and intelligent processing of various data (including transaction, text, image, video, graph and network), behaviors and systems;active, real-time, personalized, actionable and automated analytics, learning, computation, optimization, presentation and recommendation; big data architecture, infrastructure, computing, matching, indexing, query processing, mapping, search, retrieval, interoperability, exchange, and recommendation;in-memory, distributed, parallel, scalable and high-performance computing, analytics and optimization for big data;review, surveys, trends, prospects and opportunities of data science research, innovation and applications;data science applications, intelligent devices and services in scientific, business, governmental, cultural, behavioral, social and economic, health and medical, human, natural and artificial (including online/Web, cloud, IoT, mobile and social media) domains; andethics, quality, privacy, safety and security, trust, and risk of data science and analytics