Shadi Rahimian, Tribhuvanesh Orekondy, Mario Fritz
{"title":"Differential Privacy Defenses and Sampling Attacks for Membership Inference","authors":"Shadi Rahimian, Tribhuvanesh Orekondy, Mario Fritz","doi":"10.1145/3474369.3486876","DOIUrl":"https://doi.org/10.1145/3474369.3486876","url":null,"abstract":"Machine learning models are commonly trained on sensitive and personal data such as pictures, medical records, financial records, etc. A serious breach of the privacy of this training set occurs when an adversary is able to decide whether or not a specific data point in her possession was used to train a model. While all previous membership inference attacks rely on access to the posterior probabilities, we present the first attack which only relies on the predicted class label - yet shows high success rate.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129402907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppina Andresini, Feargus Pendlebury, Fabio Pierazzi, Corrado Loglisci, A. Appice, L. Cavallaro
{"title":"INSOMNIA","authors":"Giuseppina Andresini, Feargus Pendlebury, Fabio Pierazzi, Corrado Loglisci, A. Appice, L. Cavallaro","doi":"10.1145/3474369.3486864","DOIUrl":"https://doi.org/10.1145/3474369.3486864","url":null,"abstract":"Despite decades of research in network traffic analysis and incredible advances in artificial intelligence, network intrusion detection systems based on machine learning (ML) have yet to prove their worth. One core obstacle is the existence of concept drift, an issue for all adversary-facing security systems. Additionally, specific challenges set intrusion detection apart from other ML-based security tasks, such as malware detection. In this work, we offer a new perspective on these challenges. We propose INSOMNIA, a semi-supervised intrusion detector which continuously updates the underlying ML model as network traffic characteristics are affected by concept drift. We use active learning to reduce latency in the model updates, label estimation to reduce labeling overhead, and apply explainable AI to better interpret how the model reacts to the shifting distribution. To evaluate INSOMNIA, we extend TESSERACT - a framework originally proposed for performing sound time-aware evaluations of ML-based malware detectors - to the network intrusion domain. Our evaluation shows that accounting for drifting scenarios is vital for effective intrusion detection systems.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122867169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tom Ganz, Martin Härterich, Alexander Warnecke, Konrad Rieck
{"title":"Explaining Graph Neural Networks for Vulnerability Discovery","authors":"Tom Ganz, Martin Härterich, Alexander Warnecke, Konrad Rieck","doi":"10.1145/3474369.3486866","DOIUrl":"https://doi.org/10.1145/3474369.3486866","url":null,"abstract":"Graph neural networks (GNNs) have proven to be an effective tool for vulnerability discovery that outperforms learning-based methods working directly on source code. Unfortunately, these neural networks are uninterpretable models, whose decision process is completely opaque to security experts, which obstructs their practical adoption. Recently, several methods have been proposed for explaining models of machine learning. However, it is unclear whether these methods are suitable for GNNs and support the task of vulnerability discovery. In this paper we present a framework for evaluating explanation methods on GNNs. We develop a set of criteria for comparing graph explanations and linking them to properties of source code. Based on these criteria, we conduct an experimental study of nine regular and three graph-specific explanation methods. Our study demonstrates that explaining GNNs is a non-trivial task and all evaluation criteria play a role in assessing their efficacy. We further show that graph-specific explanations relate better to code semantics and provide more information to a security expert than regular methods.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125146631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unicode Evil: Evading NLP Systems Using Visual Similarities of Text Characters","authors":"A. Dionysiou, E. Athanasopoulos","doi":"10.1145/3474369.3486871","DOIUrl":"https://doi.org/10.1145/3474369.3486871","url":null,"abstract":"Adversarial Text Generation Frameworks (ATGFs) aim at causing a Natural Language Processing (NLP) machine to misbehave, i.e., misclassify a given input. In this paper, we propose EvilText, a general ATGF that successfully evades some of the most popular NLP machines by (efficiently) perturbing a given legitimate text, preserving at the same time the original text's semantics as well as human readability. Perturbations are based on visually similar classes of characters appearing in the unicode set. EvilText can be utilized from NLP services' operators for evaluating their systems security and robustness. Furthermore, EvilText outperforms the state-of-the-art ATGFs, in terms of: (a) effectiveness, (b) efficiency and (c) original text's semantics and human readability preservation. We evaluate EvilText on some of the most popular NLP systems used for sentiment analysis and toxic content detection. We further expand on the generality and transferability of our ATGF, while also exploring possible countermeasures for defending against our attacks. Surprisingly, naive defence mechanisms fail to mitigate our attacks; the only promising one being the restriction of unicode characters use. However, we argue that restricting the use of unicode characters imposes a significant trade-off between security and usability as almost all websites are heavily based on unicode support.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122185934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"StackBERT","authors":"Chinmay Deshpande, David Gens, M. Franz","doi":"10.1145/3474369.3486865","DOIUrl":"https://doi.org/10.1145/3474369.3486865","url":null,"abstract":"The call stack represents one of the core abstractions that compiler-generated programs leverage to organize binary execution at runtime. For many use cases reasoning about stack accesses of binary functions is crucial: security-sensitive applications may require patching even after deployment, and binary instrumentation, rewriting, and lifting all necessitate detailed knowledge about the function frame layout of the affected program. As no comprehensive solution to the stack symbolization problem exists to date, existing approaches have to resort to workarounds like emulated stack environments, resulting in increased runtime overheads. In this paper we present StackBERT, a framework to statically reason about and reliably recover stack frame information of binary functions in stripped and highly optimized programs. The core idea behind our approach is to formulate binary analysis as a self-supervised learning problem by automatically generating ground truth data from a large corpus of open-source programs. We train a state-of-the-art Transformer model with self-attention and finetune for stack frame size prediction. We show that our finetuned model yields highly accurate estimates of a binary function's stack size from its function body alone across different instruction-set architectures, compiler toolchains, and optimization levels. We successfully verify the static estimates against runtime data through dynamic executions of standard benchmarks and additional studies, demonstrating that StackBERT's predictions generalize to 93.44% of stripped and highly optimized test binaries not seen during training. We envision these results to be useful for improving binary rewriting and lifting approaches in the future.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"746 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133693263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Session 2A: Machine Learning for Cybersecurity","authors":"Nicholas Carlini","doi":"10.1145/3494694","DOIUrl":"https://doi.org/10.1145/3494694","url":null,"abstract":"","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128400771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automating Privilege Escalation with Deep Reinforcement Learning","authors":"Kalle Kujanpää, Willie Victor, A. Ilin","doi":"10.1145/3474369.3486877","DOIUrl":"https://doi.org/10.1145/3474369.3486877","url":null,"abstract":"AI-based defensive solutions are necessary to defend networks and information assets against intelligent automated attacks. Gathering enough realistic data for training machine learning-based defenses is a significant practical challenge. An intelligent red teaming agent capable of performing realistic attacks can alleviate this problem. However, there is little scientific evidence demonstrating the feasibility of fully automated attacks using machine learning. In this work, we exemplify the potential threat of malicious actors using deep reinforcement learning to train automated agents. We present an agent that uses a state-of-the-art reinforcement learning algorithm to perform local privilege escalation. Our results show that the autonomous agent can escalate privileges in a Windows~7 environment using a wide variety of different techniques depending on the environment configuration it encounters. Hence, our agent is usable for generating realistic attack sensor data for training and evaluating intrusion detection systems.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123311364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luke E. Richards, A. Nguyen, Ryan Capps, Steven D. Forsythe, Cynthia Matuszek, Edward Raff
{"title":"Adversarial Transfer Attacks With Unknown Data and Class Overlap","authors":"Luke E. Richards, A. Nguyen, Ryan Capps, Steven D. Forsythe, Cynthia Matuszek, Edward Raff","doi":"10.1145/3474369.3486862","DOIUrl":"https://doi.org/10.1145/3474369.3486862","url":null,"abstract":"The ability to transfer adversarial attacks from one model (the surrogate) to another model (the victim) has been an issue of concern within the machine learning (ML) community. The ability to successfully evade unseen models represents an uncomfortable level of ease toward implementing attacks. In this work we note that as studied, current transfer attack research has an unrealistic advantage for the attacker: the attacker has the exact same training data as the victim. We present the first study of transferring adversarial attacks focusing on the data available to attacker and victim under imperfect settings without querying the victim, where there is some variable level of overlap in the exact data used or in the classes learned by each model. This threat model is relevant to applications in medicine, malware, and others. Under this new threat model attack success rate is not correlated with data or class overlap in the way one would expect, and varies with dataset. This makes it difficult for attacker and defender to reason about each other and contributes to the broader study of model robustness and security. We remedy this by developing a masked version of Projected Gradient Descent that simulates class disparity, which enables the attacker to reliably estimate a lower-bound on their attack's success.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126952514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels","authors":"R. J. Joyce, Edward Raff, Charles K. Nicholas","doi":"10.1145/3474369.3486867","DOIUrl":"https://doi.org/10.1145/3474369.3486867","url":null,"abstract":"In some problem spaces, the high cost of obtaining ground truth labels necessitates use of lower quality reference datasets. It is difficult to benchmark model performance using these datasets, as evaluation results may be biased. We propose a supplement to using reference labels, which we call an approximate ground truth refinement (AGTR). Using an AGTR, we prove that bounds on specific metrics used to evaluate clustering algorithms and multi-class classifiers can be computed without reference labels. We also introduce a procedure that uses an AGTR to identify inaccurate evaluation results produced from datasets of dubious quality. Creating an AGTR requires domain knowledge, and malware family classification is a task with robust domain knowledge approaches that support the construction of an AGTR. We demonstrate our AGTR evaluation framework by applying it to a popular malware labeling tool to diagnose over-fitting in prior testing and evaluate changes whose impact could not be meaningfully quantified under previous data.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125829911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Runhua Xu, N. Baracaldo, Yi Zhou, Ali Anwar, J. Joshi, Heiko Ludwig
{"title":"FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data","authors":"Runhua Xu, N. Baracaldo, Yi Zhou, Ali Anwar, J. Joshi, Heiko Ludwig","doi":"10.1145/3474369.3486872","DOIUrl":"https://doi.org/10.1145/3474369.3486872","url":null,"abstract":"Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties to keep their data private and only model updates are shared. Most existing approaches have focused on horizontal FL, while many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes and works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer to the comparable state-of-the-art approaches.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"27 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122474099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}