针对文本数据木马攻击的自适应黑盒防御

2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS) Pub Date : 2021-12-06 DOI:10.1109/SNAMS53716.2021.9732112

Fatima Alsharadgah, Abdallah Khreishah, M. Al-Ayyoub, Y. Jararweh, Guanxiong Liu, Issa M. Khalil, M. Almutiry, Nasir Saeed

{"title":"针对文本数据木马攻击的自适应黑盒防御","authors":"Fatima Alsharadgah, Abdallah Khreishah, M. Al-Ayyoub, Y. Jararweh, Guanxiong Liu, Issa M. Khalil, M. Almutiry, Nasir Saeed","doi":"10.1109/SNAMS53716.2021.9732112","DOIUrl":null,"url":null,"abstract":"Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or can run back-propagation through it. Moreover, most of exiting works that propose white-box and black-box methods to defend Trojan backdoor focus on image data. Due to the the difference in the data structure, these defenses cannot be directly applied for textual data. We propose T-TROJDEF which is a more practical but challenging black-box defense method for text data that only needs to run forward-pass of the NN model. T-TROJDEF tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when perturbed due to the perturbation of the classification features.","PeriodicalId":387260,"journal":{"name":"2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Adaptive Black-box Defense against Trojan Attacks on Text Data\",\"authors\":\"Fatima Alsharadgah, Abdallah Khreishah, M. Al-Ayyoub, Y. Jararweh, Guanxiong Liu, Issa M. Khalil, M. Almutiry, Nasir Saeed\",\"doi\":\"10.1109/SNAMS53716.2021.9732112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or can run back-propagation through it. Moreover, most of exiting works that propose white-box and black-box methods to defend Trojan backdoor focus on image data. Due to the the difference in the data structure, these defenses cannot be directly applied for textual data. We propose T-TROJDEF which is a more practical but challenging black-box defense method for text data that only needs to run forward-pass of the NN model. T-TROJDEF tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when perturbed due to the perturbation of the classification features.\",\"PeriodicalId\":387260,\"journal\":{\"name\":\"2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SNAMS53716.2021.9732112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNAMS53716.2021.9732112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

木马后门是一种针对神经网络(NN)分类器的中毒攻击，攻击者试图利用(非常理想的)模型重用特性，通过中毒训练过程将木马植入模型参数以进行后门攻击。大多数针对木马攻击的防御措施都采用了白盒设置，在这种设置中，防御者要么可以访问神经网络的内部状态，要么可以通过它进行反向传播。此外，现有的针对木马后门的白盒和黑盒防御方法大多集中在图像数据上。由于数据结构的不同，这些防御措施不能直接应用于文本数据。我们提出了T-TROJDEF，这是一种更实用但具有挑战性的黑盒防御方法，只需要对神经网络模型进行前向传递。T-TROJDEF试图通过监测预测置信度的变化来识别和过滤掉木马输入(即，通过木马触发器增强的输入)，当输入反复受到干扰时。直觉上，特洛伊输入更稳定，因为误分类只取决于触发器，而良性输入则会因为分类特征的扰动而受到干扰。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Adaptive Black-box Defense against Trojan Attacks on Text Data

Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or can run back-propagation through it. Moreover, most of exiting works that propose white-box and black-box methods to defend Trojan backdoor focus on image data. Due to the the difference in the data structure, these defenses cannot be directly applied for textual data. We propose T-TROJDEF which is a more practical but challenging black-box defense method for text data that only needs to run forward-pass of the NN model. T-TROJDEF tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when perturbed due to the perturbation of the classification features.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)

自引率

0.00%

发文量