Efficient concept drift handling for batch android malware detection models

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Pervasive and Mobile Computing Pub Date : 2023-10-20 DOI:10.1016/j.pmcj.2023.101849

Borja Molina-Coronado , Usue Mori , Alexander Mendiburu , Jose Miguel-Alonso

{"title":"Efficient concept drift handling for batch android malware detection models","authors":"Borja Molina-Coronado , Usue Mori , Alexander Mendiburu , Jose Miguel-Alonso","doi":"10.1016/j.pmcj.2023.101849","DOIUrl":null,"url":null,"abstract":"<div><p>The rapidly evolving nature of Android apps poses a significant challenge to static batch machine learning algorithms employed in malware detection systems, as they quickly become obsolete. Despite this challenge, the existing literature pays limited attention to addressing this issue, with many advanced Android malware detection approaches, such as Drebin, DroidDet and MaMaDroid, relying on static models. In this work, we show how retraining techniques are able to maintain detector capabilities over time. Particularly, we analyze the effect of two aspects in the efficiency and performance of the detectors: (1) the frequency with which the models are retrained, and (2) the data used for retraining. In the first experiment, we compare periodic retraining with a more advanced concept drift detection method that triggers retraining only when necessary. In the second experiment, we analyze sampling methods to reduce the amount of data used to retrain models. Specifically, we compare fixed sized windows of recent data and state-of-the-art active learning methods that select those apps that help keep the training dataset small but diverse. Our experiments show that concept drift detection and sample selection mechanisms result in very efficient retraining strategies which can be successfully used to maintain the performance of the static Android malware state-of-the-art detectors in changing environments.</p></div>","PeriodicalId":49005,"journal":{"name":"Pervasive and Mobile Computing","volume":"96 ","pages":"Article 101849"},"PeriodicalIF":3.0000,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pervasive and Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574119223001074","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The rapidly evolving nature of Android apps poses a significant challenge to static batch machine learning algorithms employed in malware detection systems, as they quickly become obsolete. Despite this challenge, the existing literature pays limited attention to addressing this issue, with many advanced Android malware detection approaches, such as Drebin, DroidDet and MaMaDroid, relying on static models. In this work, we show how retraining techniques are able to maintain detector capabilities over time. Particularly, we analyze the effect of two aspects in the efficiency and performance of the detectors: (1) the frequency with which the models are retrained, and (2) the data used for retraining. In the first experiment, we compare periodic retraining with a more advanced concept drift detection method that triggers retraining only when necessary. In the second experiment, we analyze sampling methods to reduce the amount of data used to retrain models. Specifically, we compare fixed sized windows of recent data and state-of-the-art active learning methods that select those apps that help keep the training dataset small but diverse. Our experiments show that concept drift detection and sample selection mechanisms result in very efficient retraining strategies which can be successfully used to maintain the performance of the static Android malware state-of-the-art detectors in changing environments.

查看原文本刊更多论文

有效的概念漂移处理批android恶意软件检测模型

安卓应用程序的快速发展对恶意软件检测系统中使用的静态批处理机器学习算法构成了重大挑战，因为它们很快就会过时。尽管存在这一挑战，但现有文献对解决这一问题的关注有限，许多先进的安卓恶意软件检测方法，如Drebin、DroidDet和MaMaDroid，都依赖于静态模型。在这项工作中，我们展示了再培训技术如何能够随着时间的推移保持检测器的能力。特别地，我们分析了两个方面对检测器的效率和性能的影响：（1）对模型进行再训练的频率，以及（2）用于再训练的数据。在第一个实验中，我们将周期性再训练与更先进的概念漂移检测方法进行了比较，该方法仅在必要时触发再训练。在第二个实验中，我们分析了采样方法，以减少用于重新训练模型的数据量。具体来说，我们比较了固定大小的最近数据窗口和最先进的主动学习方法，这些方法选择了有助于保持训练数据集小而多样的应用程序。我们的实验表明，概念漂移检测和样本选择机制产生了非常有效的再训练策略，可以成功地用于在不断变化的环境中保持静态安卓恶意软件最先进检测器的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pervasive and Mobile Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-TELECOMMUNICATIONS

CiteScore

7.70

自引率

2.30%

发文量

审稿时长

68 days

期刊介绍： As envisioned by Mark Weiser as early as 1991, pervasive computing systems and services have truly become integral parts of our daily lives. Tremendous developments in a multitude of technologies ranging from personalized and embedded smart devices (e.g., smartphones, sensors, wearables, IoTs, etc.) to ubiquitous connectivity, via a variety of wireless mobile communications and cognitive networking infrastructures, to advanced computing techniques (including edge, fog and cloud) and user-friendly middleware services and platforms have significantly contributed to the unprecedented advances in pervasive and mobile computing. Cutting-edge applications and paradigms have evolved, such as cyber-physical systems and smart environments (e.g., smart city, smart energy, smart transportation, smart healthcare, etc.) that also involve human in the loop through social interactions and participatory and/or mobile crowd sensing, for example. The goal of pervasive computing systems is to improve human experience and quality of life, without explicit awareness of the underlying communications and computing technologies. The Pervasive and Mobile Computing Journal (PMC) is a high-impact, peer-reviewed technical journal that publishes high-quality scientific articles spanning theory and practice, and covering all aspects of pervasive and mobile computing and systems.