Backdozer: A Backdoor Detection Methodology for DRL-based Traffic Controllers

Journal on Autonomous Transportation Systems Pub Date : 2024-01-08 DOI:10.1145/3639828

Yue Wang, Wenqing Li, Manaar Alam, M. Maniatakos, S. Jabari

{"title":"Backdozer: A Backdoor Detection Methodology for DRL-based Traffic Controllers","authors":"Yue Wang, Wenqing Li, Manaar Alam, M. Maniatakos, S. Jabari","doi":"10.1145/3639828","DOIUrl":null,"url":null,"abstract":"While the advent of Deep Reinforcement Learning (DRL) has substantially improved the efficiency of Autonomous Vehicles (AVs), it makes them vulnerable to backdoor attacks that can potentially cause traffic congestion or even collisions. Backdoor functionality is typically implanted by poisoning training datasets with stealthy malicious data, designed to preserve high accuracy on legitimate inputs while inducing desired misclassifications for specific adversary-selected inputs. Existing countermeasures against backdoors predominantly concentrate on image classification, utilizing image-based properties, rendering these methods inapplicable to the regression tasks of DRL-based AV controllers that rely on continuous sensor data as inputs. In this paper, we introduce the first-ever defense against backdoors on regression tasks of DRL-based models, called Backdozer. Our method systematically extracts more abstract features from representations of training data by projecting them into a specific latent subspace and segregating them into several disjoint groups based on the distribution of legitimate outputs. The key observation of Backdozer is that authentic representations for each group reside in one latent subspace, whereas the incorporation of malicious data impacts that subspace. Backdozer optimizes a sample-wise weight vector for the representations capturing the disparities in projections originating from different groups. We experimentally demonstrate that Backdozer can attain \\(100\\% \\) accuracy in detecting backdoors. We also evaluate its effectiveness against three closely related state-of-the-art defenses.","PeriodicalId":388333,"journal":{"name":"Journal on Autonomous Transportation Systems","volume":"35 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal on Autonomous Transportation Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3639828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

While the advent of Deep Reinforcement Learning (DRL) has substantially improved the efficiency of Autonomous Vehicles (AVs), it makes them vulnerable to backdoor attacks that can potentially cause traffic congestion or even collisions. Backdoor functionality is typically implanted by poisoning training datasets with stealthy malicious data, designed to preserve high accuracy on legitimate inputs while inducing desired misclassifications for specific adversary-selected inputs. Existing countermeasures against backdoors predominantly concentrate on image classification, utilizing image-based properties, rendering these methods inapplicable to the regression tasks of DRL-based AV controllers that rely on continuous sensor data as inputs. In this paper, we introduce the first-ever defense against backdoors on regression tasks of DRL-based models, called Backdozer. Our method systematically extracts more abstract features from representations of training data by projecting them into a specific latent subspace and segregating them into several disjoint groups based on the distribution of legitimate outputs. The key observation of Backdozer is that authentic representations for each group reside in one latent subspace, whereas the incorporation of malicious data impacts that subspace. Backdozer optimizes a sample-wise weight vector for the representations capturing the disparities in projections originating from different groups. We experimentally demonstrate that Backdozer can attain \(100\% \) accuracy in detecting backdoors. We also evaluate its effectiveness against three closely related state-of-the-art defenses.

查看原文本刊更多论文

Backdozer：基于 DRL 的交通控制器的后门检测方法

虽然深度强化学习（DRL）的出现大大提高了自动驾驶汽车（AV）的效率，但它也使自动驾驶汽车容易受到后门攻击，这些攻击可能会导致交通堵塞甚至碰撞。后门功能通常是通过在训练数据集中植入隐蔽的恶意数据来实现的，其目的是在合法输入上保持高准确性，同时在敌方选择的特定输入上诱发所需的错误分类。现有的针对后门的对策主要集中在图像分类上，利用的是基于图像的特性，因此这些方法不适用于基于 DRL 的反车辆地雷控制器的回归任务，因为这种控制器依赖于连续的传感器数据作为输入。在本文中，我们首次针对基于 DRL 模型的回归任务提出了一种名为 Backdozer 的后门防御方法。我们的方法通过将训练数据投射到特定的潜在子空间，并根据合法输出的分布将其分成几个不相交的组，从而系统地从训练数据的表征中提取更抽象的特征。Backdozer 的主要观察点是，每个组的真实表征都位于一个潜在子空间中，而恶意数据的加入则会影响该子空间。Backdozer 优化了样本权重向量，以捕捉来自不同群体的投射中的差异。我们通过实验证明，Backdozer 在检测后门方面可以达到 \(100\% \)的准确率。我们还评估了它与三种密切相关的最先进防御方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal on Autonomous Transportation Systems

自引率

0.00%

发文量