{"title":"寻找信号:系统安全视角","authors":"Christopher Kruegel","doi":"10.1145/3531536.3533774","DOIUrl":null,"url":null,"abstract":"Over the last 20 years, my students and I have built systems that look for signals of malice in large datasets. These datasets include network traffic, program code, web transactions, and social media posts. For many of our detection systems, we used feature engineering to model properties of the data and then leveraged different types of machine learning to find outliers or to build classifiers that could recognize unwanted inputs. In this presentation, I will cover three recent works that go beyond that basic approach. First, I will talk about cross-dataset analysis. The key idea is that we look at the same data from different vantage points. Instead of directly detecting malicious instances, the analysis compares the views across multiple angles and finds those cases where these views meaningfully differ. Second, I will cover an approach to perform meta-analysis of the outputs (events) that a detection model might produce. Sometimes, looking at a single event is insufficient to determine whether it is malicious. In such cases, it is necessary to correlate multiple events. We have built a semi-supervised analysis that leverages the context of an event to determine whether it should be treated as malicious or not. Third, I will discuss ways in which attackers might attempt to thwart our efforts to build detectors. Specifically, I will talk about a fast and efficient clean-label dataset poisoning attack. In this attack, correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to a human observer, they contain malicious characteristics that trigger a targeted misclassification during detection (inference).","PeriodicalId":164949,"journal":{"name":"Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Looking for Signals: A Systems Security Perspective\",\"authors\":\"Christopher Kruegel\",\"doi\":\"10.1145/3531536.3533774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the last 20 years, my students and I have built systems that look for signals of malice in large datasets. These datasets include network traffic, program code, web transactions, and social media posts. For many of our detection systems, we used feature engineering to model properties of the data and then leveraged different types of machine learning to find outliers or to build classifiers that could recognize unwanted inputs. In this presentation, I will cover three recent works that go beyond that basic approach. First, I will talk about cross-dataset analysis. The key idea is that we look at the same data from different vantage points. Instead of directly detecting malicious instances, the analysis compares the views across multiple angles and finds those cases where these views meaningfully differ. Second, I will cover an approach to perform meta-analysis of the outputs (events) that a detection model might produce. Sometimes, looking at a single event is insufficient to determine whether it is malicious. In such cases, it is necessary to correlate multiple events. We have built a semi-supervised analysis that leverages the context of an event to determine whether it should be treated as malicious or not. Third, I will discuss ways in which attackers might attempt to thwart our efforts to build detectors. Specifically, I will talk about a fast and efficient clean-label dataset poisoning attack. In this attack, correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to a human observer, they contain malicious characteristics that trigger a targeted misclassification during detection (inference).\",\"PeriodicalId\":164949,\"journal\":{\"name\":\"Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3531536.3533774\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3531536.3533774","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Looking for Signals: A Systems Security Perspective
Over the last 20 years, my students and I have built systems that look for signals of malice in large datasets. These datasets include network traffic, program code, web transactions, and social media posts. For many of our detection systems, we used feature engineering to model properties of the data and then leveraged different types of machine learning to find outliers or to build classifiers that could recognize unwanted inputs. In this presentation, I will cover three recent works that go beyond that basic approach. First, I will talk about cross-dataset analysis. The key idea is that we look at the same data from different vantage points. Instead of directly detecting malicious instances, the analysis compares the views across multiple angles and finds those cases where these views meaningfully differ. Second, I will cover an approach to perform meta-analysis of the outputs (events) that a detection model might produce. Sometimes, looking at a single event is insufficient to determine whether it is malicious. In such cases, it is necessary to correlate multiple events. We have built a semi-supervised analysis that leverages the context of an event to determine whether it should be treated as malicious or not. Third, I will discuss ways in which attackers might attempt to thwart our efforts to build detectors. Specifically, I will talk about a fast and efficient clean-label dataset poisoning attack. In this attack, correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to a human observer, they contain malicious characteristics that trigger a targeted misclassification during detection (inference).