N. Baracaldo, Farhan Ahmed, Kevin Eykholt, Yi Zhou, Shriti Priya, Taesung Lee, S. Kadhe, Mike Tan, Sridevi Polavaram, Sterling Suggs, Yuyang Gao, David Slater
{"title":"对中毒防御对深度学习模型安全性和偏差的影响进行基准测试","authors":"N. Baracaldo, Farhan Ahmed, Kevin Eykholt, Yi Zhou, Shriti Priya, Taesung Lee, S. Kadhe, Mike Tan, Sridevi Polavaram, Sterling Suggs, Yuyang Gao, David Slater","doi":"10.1109/SPW59333.2023.00010","DOIUrl":null,"url":null,"abstract":"Machine learning models are susceptible to a class of attacks known as adversarial poisoning where an adversary can maliciously manipulate training data to hinder model performance or, more concerningly, insert backdoors to exploit at inference time. Many methods have been proposed to defend against adversarial poisoning by either identifying the poisoned samples to facilitate removal or developing poison agnostic training algorithms. Although effective, these proposed approaches can have unintended consequences on the model, such as worsening performance on certain data sub-populations, thus inducing a classification bias. In this work, we evaluate several adversarial poisoning defenses. In addition to traditional security metrics, i.e., robustness to poisoned samples, we also adapt a fairness metric to measure the potential undesirable discrimination of sub-populations resulting from using these defenses. Our investigation highlights that many of the evaluated defenses trade decision fairness to achieve higher adversarial poisoning robustness. Given these results, we recommend our proposed metric to be part of standard evaluations of machine learning defenses.","PeriodicalId":308378,"journal":{"name":"2023 IEEE Security and Privacy Workshops (SPW)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Benchmarking the Effect of Poisoning Defenses on the Security and Bias of Deep Learning Models\",\"authors\":\"N. Baracaldo, Farhan Ahmed, Kevin Eykholt, Yi Zhou, Shriti Priya, Taesung Lee, S. Kadhe, Mike Tan, Sridevi Polavaram, Sterling Suggs, Yuyang Gao, David Slater\",\"doi\":\"10.1109/SPW59333.2023.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning models are susceptible to a class of attacks known as adversarial poisoning where an adversary can maliciously manipulate training data to hinder model performance or, more concerningly, insert backdoors to exploit at inference time. Many methods have been proposed to defend against adversarial poisoning by either identifying the poisoned samples to facilitate removal or developing poison agnostic training algorithms. Although effective, these proposed approaches can have unintended consequences on the model, such as worsening performance on certain data sub-populations, thus inducing a classification bias. In this work, we evaluate several adversarial poisoning defenses. In addition to traditional security metrics, i.e., robustness to poisoned samples, we also adapt a fairness metric to measure the potential undesirable discrimination of sub-populations resulting from using these defenses. Our investigation highlights that many of the evaluated defenses trade decision fairness to achieve higher adversarial poisoning robustness. Given these results, we recommend our proposed metric to be part of standard evaluations of machine learning defenses.\",\"PeriodicalId\":308378,\"journal\":{\"name\":\"2023 IEEE Security and Privacy Workshops (SPW)\",\"volume\":\"110 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Security and Privacy Workshops (SPW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPW59333.2023.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW59333.2023.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Benchmarking the Effect of Poisoning Defenses on the Security and Bias of Deep Learning Models
Machine learning models are susceptible to a class of attacks known as adversarial poisoning where an adversary can maliciously manipulate training data to hinder model performance or, more concerningly, insert backdoors to exploit at inference time. Many methods have been proposed to defend against adversarial poisoning by either identifying the poisoned samples to facilitate removal or developing poison agnostic training algorithms. Although effective, these proposed approaches can have unintended consequences on the model, such as worsening performance on certain data sub-populations, thus inducing a classification bias. In this work, we evaluate several adversarial poisoning defenses. In addition to traditional security metrics, i.e., robustness to poisoned samples, we also adapt a fairness metric to measure the potential undesirable discrimination of sub-populations resulting from using these defenses. Our investigation highlights that many of the evaluated defenses trade decision fairness to achieve higher adversarial poisoning robustness. Given these results, we recommend our proposed metric to be part of standard evaluations of machine learning defenses.