{"title":"NLP后门检测的模型不可知方法","authors":"Hema Karnam Surendrababu","doi":"10.1109/ColCACI59285.2023.10226144","DOIUrl":null,"url":null,"abstract":"Poisoning training datasets by inserting backdoors into Natural Language Processing (NLP) models can result in model misclassifications with potential adverse impacts such as evasion of toxic content detection systems, fake news publication. A majority of the NLP backdoor defenses focus on model specific defenses. The current work proposes a model agnostic approach for NLP backdoor detection. To this end two metrics are developed to successfully distinguish between clean and poisoned text data samples.","PeriodicalId":206196,"journal":{"name":"2023 IEEE Colombian Conference on Applications of Computational Intelligence (ColCACI)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Model Agnostic Approach for NLP Backdoor Detection\",\"authors\":\"Hema Karnam Surendrababu\",\"doi\":\"10.1109/ColCACI59285.2023.10226144\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Poisoning training datasets by inserting backdoors into Natural Language Processing (NLP) models can result in model misclassifications with potential adverse impacts such as evasion of toxic content detection systems, fake news publication. A majority of the NLP backdoor defenses focus on model specific defenses. The current work proposes a model agnostic approach for NLP backdoor detection. To this end two metrics are developed to successfully distinguish between clean and poisoned text data samples.\",\"PeriodicalId\":206196,\"journal\":{\"name\":\"2023 IEEE Colombian Conference on Applications of Computational Intelligence (ColCACI)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Colombian Conference on Applications of Computational Intelligence (ColCACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ColCACI59285.2023.10226144\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Colombian Conference on Applications of Computational Intelligence (ColCACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ColCACI59285.2023.10226144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Model Agnostic Approach for NLP Backdoor Detection
Poisoning training datasets by inserting backdoors into Natural Language Processing (NLP) models can result in model misclassifications with potential adverse impacts such as evasion of toxic content detection systems, fake news publication. A majority of the NLP backdoor defenses focus on model specific defenses. The current work proposes a model agnostic approach for NLP backdoor detection. To this end two metrics are developed to successfully distinguish between clean and poisoned text data samples.