LUNA: Quantifying and Leveraging Uncertainty in Android Malware Analysis through Bayesian Machine Learning

2017 IEEE European Symposium on Security and Privacy (EuroS&P) Pub Date : 2017-04-26 DOI:10.1109/EuroSP.2017.24

M. Backes, M. Nauman

{"title":"LUNA: Quantifying and Leveraging Uncertainty in Android Malware Analysis through Bayesian Machine Learning","authors":"M. Backes, M. Nauman","doi":"10.1109/EuroSP.2017.24","DOIUrl":null,"url":null,"abstract":"Android's growing popularity seems to be hindered only by the amount of malware surfacing for this open platform. Machine learning algorithms have been successfully used for detecting the rapidly growing number of malware families appearing on a daily basis. Existing solutions along these lines, however, have a common limitation: they are all based on classical statistical inference and thus ignore the concept of uncertainty invariably involved in any prediction task. In this paper, we show that ignoring this uncertainty leads to incorrect classification of both benign and malicious apps. To reduce these errors, we utilize Bayesian machine learning – an alternative paradigm based on Bayesian statistical inference – which preserves the concept of uncertainty in all steps of calculation. We move from a black-box to a white-box approach to identify the effects different features (such as sensitive resource usage, declared activities, services and intent filters etc.) have on the classification status of an app. We show that incorporating uncertainty in the learning pipeline helps to reduce incorrect decisions, and significantly improves the accuracy of classification. We achieve a false positive rate of 0.2% compared to the previous best of 1%. We present sufficient details to allow the reader to reproduce our results through openly available probabilistic programming tools and to extend our techniques well beyond the boundaries of this paper.","PeriodicalId":233564,"journal":{"name":"2017 IEEE European Symposium on Security and Privacy (EuroS&P)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE European Symposium on Security and Privacy (EuroS&P)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EuroSP.2017.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Android's growing popularity seems to be hindered only by the amount of malware surfacing for this open platform. Machine learning algorithms have been successfully used for detecting the rapidly growing number of malware families appearing on a daily basis. Existing solutions along these lines, however, have a common limitation: they are all based on classical statistical inference and thus ignore the concept of uncertainty invariably involved in any prediction task. In this paper, we show that ignoring this uncertainty leads to incorrect classification of both benign and malicious apps. To reduce these errors, we utilize Bayesian machine learning – an alternative paradigm based on Bayesian statistical inference – which preserves the concept of uncertainty in all steps of calculation. We move from a black-box to a white-box approach to identify the effects different features (such as sensitive resource usage, declared activities, services and intent filters etc.) have on the classification status of an app. We show that incorporating uncertainty in the learning pipeline helps to reduce incorrect decisions, and significantly improves the accuracy of classification. We achieve a false positive rate of 0.2% compared to the previous best of 1%. We present sufficient details to allow the reader to reproduce our results through openly available probabilistic programming tools and to extend our techniques well beyond the boundaries of this paper.

查看原文本刊更多论文

通过贝叶斯机器学习量化和利用Android恶意软件分析中的不确定性

Android越来越受欢迎似乎只是因为这个开放平台上出现了大量恶意软件。机器学习算法已经被成功地用于检测每天出现的数量迅速增长的恶意软件家族。然而，现有的这些解决方案都有一个共同的局限性:它们都是基于经典的统计推断，因此忽略了任何预测任务中不可避免地涉及的不确定性概念。在本文中，我们表明忽略这种不确定性会导致对良性和恶意应用程序的错误分类。为了减少这些错误，我们利用贝叶斯机器学习——一种基于贝叶斯统计推断的替代范式——在计算的所有步骤中保留了不确定性的概念。我们从黑盒方法转向白盒方法，以识别不同特征(如敏感资源使用、声明的活动、服务和意图过滤器等)对应用程序分类状态的影响。我们表明，在学习管道中纳入不确定性有助于减少错误决策，并显着提高分类的准确性。我们实现了0.2%的假阳性率，而之前最好的是1%。我们提供了足够的细节，允许读者通过公开可用的概率编程工具重现我们的结果，并将我们的技术扩展到远远超出本文的范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE European Symposium on Security and Privacy (EuroS&P)

自引率

0.00%

发文量