寻找信号:系统安全视角

Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security Pub Date : 2022-06-23 DOI:10.1145/3531536.3533774

Christopher Kruegel

{"title":"寻找信号:系统安全视角","authors":"Christopher Kruegel","doi":"10.1145/3531536.3533774","DOIUrl":null,"url":null,"abstract":"Over the last 20 years, my students and I have built systems that look for signals of malice in large datasets. These datasets include network traffic, program code, web transactions, and social media posts. For many of our detection systems, we used feature engineering to model properties of the data and then leveraged different types of machine learning to find outliers or to build classifiers that could recognize unwanted inputs. In this presentation, I will cover three recent works that go beyond that basic approach. First, I will talk about cross-dataset analysis. The key idea is that we look at the same data from different vantage points. Instead of directly detecting malicious instances, the analysis compares the views across multiple angles and finds those cases where these views meaningfully differ. Second, I will cover an approach to perform meta-analysis of the outputs (events) that a detection model might produce. Sometimes, looking at a single event is insufficient to determine whether it is malicious. In such cases, it is necessary to correlate multiple events. We have built a semi-supervised analysis that leverages the context of an event to determine whether it should be treated as malicious or not. Third, I will discuss ways in which attackers might attempt to thwart our efforts to build detectors. Specifically, I will talk about a fast and efficient clean-label dataset poisoning attack. In this attack, correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to a human observer, they contain malicious characteristics that trigger a targeted misclassification during detection (inference).","PeriodicalId":164949,"journal":{"name":"Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Looking for Signals: A Systems Security Perspective\",\"authors\":\"Christopher Kruegel\",\"doi\":\"10.1145/3531536.3533774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the last 20 years, my students and I have built systems that look for signals of malice in large datasets. These datasets include network traffic, program code, web transactions, and social media posts. For many of our detection systems, we used feature engineering to model properties of the data and then leveraged different types of machine learning to find outliers or to build classifiers that could recognize unwanted inputs. In this presentation, I will cover three recent works that go beyond that basic approach. First, I will talk about cross-dataset analysis. The key idea is that we look at the same data from different vantage points. Instead of directly detecting malicious instances, the analysis compares the views across multiple angles and finds those cases where these views meaningfully differ. Second, I will cover an approach to perform meta-analysis of the outputs (events) that a detection model might produce. Sometimes, looking at a single event is insufficient to determine whether it is malicious. In such cases, it is necessary to correlate multiple events. We have built a semi-supervised analysis that leverages the context of an event to determine whether it should be treated as malicious or not. Third, I will discuss ways in which attackers might attempt to thwart our efforts to build detectors. Specifically, I will talk about a fast and efficient clean-label dataset poisoning attack. In this attack, correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to a human observer, they contain malicious characteristics that trigger a targeted misclassification during detection (inference).\",\"PeriodicalId\":164949,\"journal\":{\"name\":\"Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3531536.3533774\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3531536.3533774","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在过去的20年里，我和我的学生们建立了一个系统，可以在大型数据集中寻找恶意信号。这些数据集包括网络流量、程序代码、web交易和社交媒体帖子。对于我们的许多检测系统，我们使用特征工程来建模数据的属性，然后利用不同类型的机器学习来找到异常值或构建可以识别不需要输入的分类器。在这次演讲中，我将介绍最近的三个超越基本方法的作品。首先，我将讨论跨数据集分析。关键思想是我们从不同的有利位置看同样的数据。该分析不是直接检测恶意实例，而是从多个角度比较视图，并找到这些视图有意义不同的情况。其次，我将介绍一种对检测模型可能产生的输出(事件)执行元分析的方法。有时，仅查看单个事件不足以确定其是否为恶意事件。在这种情况下，有必要将多个事件关联起来。我们已经建立了一个半监督分析，它利用事件的上下文来确定它是否应该被视为恶意。第三，我将讨论攻击者可能试图阻挠我们建立检测器的方法。具体来说，我将讨论一种快速有效的干净标签数据集中毒攻击。在这种攻击中，正确标记的有毒样本被注入到训练数据集中。虽然这些有毒样本在人类观察者看来是合法的，但它们含有恶意特征，会在检测(推理)期间触发有针对性的错误分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Looking for Signals: A Systems Security Perspective

Over the last 20 years, my students and I have built systems that look for signals of malice in large datasets. These datasets include network traffic, program code, web transactions, and social media posts. For many of our detection systems, we used feature engineering to model properties of the data and then leveraged different types of machine learning to find outliers or to build classifiers that could recognize unwanted inputs. In this presentation, I will cover three recent works that go beyond that basic approach. First, I will talk about cross-dataset analysis. The key idea is that we look at the same data from different vantage points. Instead of directly detecting malicious instances, the analysis compares the views across multiple angles and finds those cases where these views meaningfully differ. Second, I will cover an approach to perform meta-analysis of the outputs (events) that a detection model might produce. Sometimes, looking at a single event is insufficient to determine whether it is malicious. In such cases, it is necessary to correlate multiple events. We have built a semi-supervised analysis that leverages the context of an event to determine whether it should be treated as malicious or not. Third, I will discuss ways in which attackers might attempt to thwart our efforts to build detectors. Specifically, I will talk about a fast and efficient clean-label dataset poisoning attack. In this attack, correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to a human observer, they contain malicious characteristics that trigger a targeted misclassification during detection (inference).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security

自引率

0.00%

发文量