研究无标签漂移适应恶意软件检测

Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security Pub Date : 2021-11-15 DOI:10.1145/3474369.3486873

Zeliang Kan, Feargus Pendlebury, Fabio Pierazzi, L. Cavallaro

{"title":"研究无标签漂移适应恶意软件检测","authors":"Zeliang Kan, Feargus Pendlebury, Fabio Pierazzi, L. Cavallaro","doi":"10.1145/3474369.3486873","DOIUrl":null,"url":null,"abstract":"The evolution of malware has long plagued machine learning-based detection systems, as malware authors develop innovative strategies to evade detection and chase profits. This induces concept drift as the test distribution diverges from the training, causing performance decay that requires constant monitoring and adaptation. In this work, we analyze the adaptation strategy used by DroidEvolver, a state-of-the-art learning system that self-updates using pseudo-labels to avoid the high overhead associated with obtaining a new ground truth. After removing sources of experimental bias present in the original evaluation, we identify a number of flaws in the generation and integration of these pseudo-labels, leading to a rapid onset of performance degradation as the model poisons itself. We propose DroidEvolver++, a more robust variant of DroidEvolver, to address these issues and highlight the role of pseudo-labels in addressing concept drift. We test the tolerance of the adaptation strategy versus different degrees of pseudo-label noise and propose the adoption of methods to ensure only high-quality pseudo-labels are used for updates. Ultimately, we conclude that the use of pseudo-labeling remains a promising solution to limitations on labeling capacity, but great care must be taken when designing update mechanisms to avoid negative feedback loops and self-poisoning which have catastrophic effects on performance.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Investigating Labelless Drift Adaptation for Malware Detection\",\"authors\":\"Zeliang Kan, Feargus Pendlebury, Fabio Pierazzi, L. Cavallaro\",\"doi\":\"10.1145/3474369.3486873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The evolution of malware has long plagued machine learning-based detection systems, as malware authors develop innovative strategies to evade detection and chase profits. This induces concept drift as the test distribution diverges from the training, causing performance decay that requires constant monitoring and adaptation. In this work, we analyze the adaptation strategy used by DroidEvolver, a state-of-the-art learning system that self-updates using pseudo-labels to avoid the high overhead associated with obtaining a new ground truth. After removing sources of experimental bias present in the original evaluation, we identify a number of flaws in the generation and integration of these pseudo-labels, leading to a rapid onset of performance degradation as the model poisons itself. We propose DroidEvolver++, a more robust variant of DroidEvolver, to address these issues and highlight the role of pseudo-labels in addressing concept drift. We test the tolerance of the adaptation strategy versus different degrees of pseudo-label noise and propose the adoption of methods to ensure only high-quality pseudo-labels are used for updates. Ultimately, we conclude that the use of pseudo-labeling remains a promising solution to limitations on labeling capacity, but great care must be taken when designing update mechanisms to avoid negative feedback loops and self-poisoning which have catastrophic effects on performance.\",\"PeriodicalId\":411057,\"journal\":{\"name\":\"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3474369.3486873\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474369.3486873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

恶意软件的演变长期以来一直困扰着基于机器学习的检测系统，因为恶意软件的作者开发了创新的策略来逃避检测和追逐利润。当测试分布偏离训练时，这就会导致概念漂移，从而导致需要不断监控和适应的性能衰减。在这项工作中，我们分析了DroidEvolver使用的自适应策略，DroidEvolver是一种最先进的学习系统，它使用伪标签进行自我更新，以避免与获得新的基础真值相关的高开销。在消除了原始评估中存在的实验偏差来源后，我们发现了这些伪标签生成和集成中的一些缺陷，这些缺陷会导致模型自身中毒而导致性能迅速下降。我们提出了DroidEvolver++，一个更健壮的DroidEvolver变体，来解决这些问题，并强调伪标签在解决概念漂移中的作用。我们测试了自适应策略对不同程度伪标签噪声的容忍度，并提出了采用方法来确保只使用高质量的伪标签进行更新。最后，我们得出结论，使用伪标注仍然是一个有希望的解决方案，限制标注能力，但在设计更新机制时必须非常小心，以避免负反馈循环和自我中毒，这对性能有灾难性的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Investigating Labelless Drift Adaptation for Malware Detection

The evolution of malware has long plagued machine learning-based detection systems, as malware authors develop innovative strategies to evade detection and chase profits. This induces concept drift as the test distribution diverges from the training, causing performance decay that requires constant monitoring and adaptation. In this work, we analyze the adaptation strategy used by DroidEvolver, a state-of-the-art learning system that self-updates using pseudo-labels to avoid the high overhead associated with obtaining a new ground truth. After removing sources of experimental bias present in the original evaluation, we identify a number of flaws in the generation and integration of these pseudo-labels, leading to a rapid onset of performance degradation as the model poisons itself. We propose DroidEvolver++, a more robust variant of DroidEvolver, to address these issues and highlight the role of pseudo-labels in addressing concept drift. We test the tolerance of the adaptation strategy versus different degrees of pseudo-label noise and propose the adoption of methods to ensure only high-quality pseudo-labels are used for updates. Ultimately, we conclude that the use of pseudo-labeling remains a promising solution to limitations on labeling capacity, but great care must be taken when designing update mechanisms to avoid negative feedback loops and self-poisoning which have catastrophic effects on performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security

自引率

0.00%

发文量