针对文本数据木马攻击的自适应黑盒防御

Fatima Alsharadgah, Abdallah Khreishah, M. Al-Ayyoub, Y. Jararweh, Guanxiong Liu, Issa M. Khalil, M. Almutiry, Nasir Saeed
{"title":"针对文本数据木马攻击的自适应黑盒防御","authors":"Fatima Alsharadgah, Abdallah Khreishah, M. Al-Ayyoub, Y. Jararweh, Guanxiong Liu, Issa M. Khalil, M. Almutiry, Nasir Saeed","doi":"10.1109/SNAMS53716.2021.9732112","DOIUrl":null,"url":null,"abstract":"Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or can run back-propagation through it. Moreover, most of exiting works that propose white-box and black-box methods to defend Trojan backdoor focus on image data. Due to the the difference in the data structure, these defenses cannot be directly applied for textual data. We propose T-TROJDEF which is a more practical but challenging black-box defense method for text data that only needs to run forward-pass of the NN model. T-TROJDEF tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when perturbed due to the perturbation of the classification features.","PeriodicalId":387260,"journal":{"name":"2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Adaptive Black-box Defense against Trojan Attacks on Text Data\",\"authors\":\"Fatima Alsharadgah, Abdallah Khreishah, M. Al-Ayyoub, Y. Jararweh, Guanxiong Liu, Issa M. Khalil, M. Almutiry, Nasir Saeed\",\"doi\":\"10.1109/SNAMS53716.2021.9732112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or can run back-propagation through it. Moreover, most of exiting works that propose white-box and black-box methods to defend Trojan backdoor focus on image data. Due to the the difference in the data structure, these defenses cannot be directly applied for textual data. We propose T-TROJDEF which is a more practical but challenging black-box defense method for text data that only needs to run forward-pass of the NN model. T-TROJDEF tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when perturbed due to the perturbation of the classification features.\",\"PeriodicalId\":387260,\"journal\":{\"name\":\"2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SNAMS53716.2021.9732112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNAMS53716.2021.9732112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

木马后门是一种针对神经网络(NN)分类器的中毒攻击,攻击者试图利用(非常理想的)模型重用特性,通过中毒训练过程将木马植入模型参数以进行后门攻击。大多数针对木马攻击的防御措施都采用了白盒设置,在这种设置中,防御者要么可以访问神经网络的内部状态,要么可以通过它进行反向传播。此外,现有的针对木马后门的白盒和黑盒防御方法大多集中在图像数据上。由于数据结构的不同,这些防御措施不能直接应用于文本数据。我们提出了T-TROJDEF,这是一种更实用但具有挑战性的黑盒防御方法,只需要对神经网络模型进行前向传递。T-TROJDEF试图通过监测预测置信度的变化来识别和过滤掉木马输入(即,通过木马触发器增强的输入),当输入反复受到干扰时。直觉上,特洛伊输入更稳定,因为误分类只取决于触发器,而良性输入则会因为分类特征的扰动而受到干扰。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Adaptive Black-box Defense against Trojan Attacks on Text Data
Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or can run back-propagation through it. Moreover, most of exiting works that propose white-box and black-box methods to defend Trojan backdoor focus on image data. Due to the the difference in the data structure, these defenses cannot be directly applied for textual data. We propose T-TROJDEF which is a more practical but challenging black-box defense method for text data that only needs to run forward-pass of the NN model. T-TROJDEF tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when perturbed due to the perturbation of the classification features.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信