Yue Wang, Wenqing Li, Manaar Alam, M. Maniatakos, S. Jabari
{"title":"Backdozer: A Backdoor Detection Methodology for DRL-based Traffic Controllers","authors":"Yue Wang, Wenqing Li, Manaar Alam, M. Maniatakos, S. Jabari","doi":"10.1145/3639828","DOIUrl":null,"url":null,"abstract":"While the advent of Deep Reinforcement Learning (DRL) has substantially improved the efficiency of Autonomous Vehicles (AVs), it makes them vulnerable to backdoor attacks that can potentially cause traffic congestion or even collisions. Backdoor functionality is typically implanted by poisoning training datasets with stealthy malicious data, designed to preserve high accuracy on legitimate inputs while inducing desired misclassifications for specific adversary-selected inputs. Existing countermeasures against backdoors predominantly concentrate on image classification, utilizing image-based properties, rendering these methods inapplicable to the regression tasks of DRL-based AV controllers that rely on continuous sensor data as inputs. In this paper, we introduce the first-ever defense against backdoors on regression tasks of DRL-based models, called Backdozer. Our method systematically extracts more abstract features from representations of training data by projecting them into a specific latent subspace and segregating them into several disjoint groups based on the distribution of legitimate outputs. The key observation of Backdozer is that authentic representations for each group reside in one latent subspace, whereas the incorporation of malicious data impacts that subspace. Backdozer optimizes a sample-wise weight vector for the representations capturing the disparities in projections originating from different groups. We experimentally demonstrate that Backdozer can attain \\(100\\% \\) accuracy in detecting backdoors. We also evaluate its effectiveness against three closely related state-of-the-art defenses.","PeriodicalId":388333,"journal":{"name":"Journal on Autonomous Transportation Systems","volume":"35 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal on Autonomous Transportation Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3639828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
While the advent of Deep Reinforcement Learning (DRL) has substantially improved the efficiency of Autonomous Vehicles (AVs), it makes them vulnerable to backdoor attacks that can potentially cause traffic congestion or even collisions. Backdoor functionality is typically implanted by poisoning training datasets with stealthy malicious data, designed to preserve high accuracy on legitimate inputs while inducing desired misclassifications for specific adversary-selected inputs. Existing countermeasures against backdoors predominantly concentrate on image classification, utilizing image-based properties, rendering these methods inapplicable to the regression tasks of DRL-based AV controllers that rely on continuous sensor data as inputs. In this paper, we introduce the first-ever defense against backdoors on regression tasks of DRL-based models, called Backdozer. Our method systematically extracts more abstract features from representations of training data by projecting them into a specific latent subspace and segregating them into several disjoint groups based on the distribution of legitimate outputs. The key observation of Backdozer is that authentic representations for each group reside in one latent subspace, whereas the incorporation of malicious data impacts that subspace. Backdozer optimizes a sample-wise weight vector for the representations capturing the disparities in projections originating from different groups. We experimentally demonstrate that Backdozer can attain \(100\% \) accuracy in detecting backdoors. We also evaluate its effectiveness against three closely related state-of-the-art defenses.