Machine Learning and Security: The Good, The Bad, and The Ugly

Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security Pub Date : 2020-10-30 DOI:10.1145/3372297.3424552

Wenke Lee

{"title":"Machine Learning and Security: The Good, The Bad, and The Ugly","authors":"Wenke Lee","doi":"10.1145/3372297.3424552","DOIUrl":null,"url":null,"abstract":"I would like to share my thoughts on the interactions between machine learning and security. The good: We now have more data, more powerful machines and algorithms, and better yet, we don't need to always manually engineered the features. The ML process is now much more automated and the learned models are more powerful, and this is a positive feedback loop: more data leads to better models, which lead to more deployments, which lead to more data. All security vendors now advertise that they use ML in their products. The bad: There are more unknowns. In the past, we knew the capabilities and limitations of our security models, including the ML-based models, and understood how they can be evaded. But the state-of-the-art models such as deep neural networks are not as intelligible as classical models such as decision trees. How do we decide to deploy a deep learning-based model for security when we don't know for sure it is learned correctly? Data poisoning becomes easier. On-line learning and web-based learning use data collected in run-time and often from an open environment. Since such data is often resulted from human actions, it can be intentionally polluted, e.g., in misinformation campaigns. How do we make it harder for attackers to manipulate the training data? The ugly: Attackers will keep on exploiting the holes in ML, and automate their attacks using ML. Why don't we just secure ML? This would be no different than trying to secure our programs, and systems, and networks, so we can't. We have to prepare for ML failures. Ultimately, humans have to be involved. The question is how and when? For example, what information should a ML-based system present to humans and what input can humans provide to the system?","PeriodicalId":20481,"journal":{"name":"Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3372297.3424552","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

I would like to share my thoughts on the interactions between machine learning and security. The good: We now have more data, more powerful machines and algorithms, and better yet, we don't need to always manually engineered the features. The ML process is now much more automated and the learned models are more powerful, and this is a positive feedback loop: more data leads to better models, which lead to more deployments, which lead to more data. All security vendors now advertise that they use ML in their products. The bad: There are more unknowns. In the past, we knew the capabilities and limitations of our security models, including the ML-based models, and understood how they can be evaded. But the state-of-the-art models such as deep neural networks are not as intelligible as classical models such as decision trees. How do we decide to deploy a deep learning-based model for security when we don't know for sure it is learned correctly? Data poisoning becomes easier. On-line learning and web-based learning use data collected in run-time and often from an open environment. Since such data is often resulted from human actions, it can be intentionally polluted, e.g., in misinformation campaigns. How do we make it harder for attackers to manipulate the training data? The ugly: Attackers will keep on exploiting the holes in ML, and automate their attacks using ML. Why don't we just secure ML? This would be no different than trying to secure our programs, and systems, and networks, so we can't. We have to prepare for ML failures. Ultimately, humans have to be involved. The question is how and when? For example, what information should a ML-based system present to humans and what input can humans provide to the system?

查看原文本刊更多论文

机器学习与安全:好、坏、丑

我想分享一下我对机器学习和安全之间相互作用的看法。好处:我们现在有更多的数据，更强大的机器和算法，更好的是，我们不需要总是手动设计功能。机器学习过程现在更加自动化，学习模型更加强大，这是一个积极的反馈循环:更多的数据导致更好的模型，从而导致更多的部署，从而导致更多的数据。所有的安全供应商现在都宣称他们在他们的产品中使用ML。坏的方面:有更多的未知。在过去，我们知道安全模型(包括基于ml的模型)的功能和限制，并了解如何规避它们。但最先进的模型，如深度神经网络，不如经典模型，如决策树那样容易理解。当我们不确定它是否正确学习时，我们如何决定部署基于深度学习的安全模型?数据中毒变得更容易。在线学习和基于web的学习使用在运行时收集的数据，通常来自开放环境。由于这些数据通常是由人类活动产生的，因此可能会被故意污染，例如在虚假宣传活动中。我们如何使攻击者难以操纵训练数据?丑陋之处:攻击者将继续利用机器学习中的漏洞，并使用机器学习自动化攻击。我们为什么不保护机器学习呢?这与试图保护我们的程序、系统和网络没有什么不同，所以我们不能。我们必须为机器学习失败做好准备。最终，人类必须参与其中。问题是如何和何时?例如，基于ml的系统应该向人类提供什么信息，人类可以向系统提供什么输入?

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

自引率

0.00%

发文量