Bingyuan Huang, Sanshuai Cui, Xiangui Kang, Enping Li
{"title":"针对语音反欺骗模型的可转移波形级对抗性攻击","authors":"Bingyuan Huang, Sanshuai Cui, Xiangui Kang, Enping Li","doi":"10.1109/ICME55011.2023.00395","DOIUrl":null,"url":null,"abstract":"Speech anti-spoofing models protect media from malicious fake speech but are vulnerable to adversarial attacks. Studies of adversarial attacks are conducive to developing robust speech anti-spoofing systems. Existing transfer-based attack methods mainly craft adversarial speech examples at the handcrafted-feature level, which have limited attack ability against the real-world anti-spoofing systems, as these systems only have raw waveform input interfaces. In this work, we propose a waveform-level input data transformation, called the temporal smoothing method, to generate more transferable adversarial speech examples. In the optimization iterations of the adversarial perturbation, we randomly smooth input waveforms to prevent the adversarial examples from overfitting white-box surrogate models. The proposed transformation can be combined with any iterative gradient-based attack method. Extensive experiments demonstrate that our method significantly enhances the transferability of waveform-level adversarial speech examples.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transferable Waveform-level Adversarial Attack against Speech Anti-spoofing Models\",\"authors\":\"Bingyuan Huang, Sanshuai Cui, Xiangui Kang, Enping Li\",\"doi\":\"10.1109/ICME55011.2023.00395\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech anti-spoofing models protect media from malicious fake speech but are vulnerable to adversarial attacks. Studies of adversarial attacks are conducive to developing robust speech anti-spoofing systems. Existing transfer-based attack methods mainly craft adversarial speech examples at the handcrafted-feature level, which have limited attack ability against the real-world anti-spoofing systems, as these systems only have raw waveform input interfaces. In this work, we propose a waveform-level input data transformation, called the temporal smoothing method, to generate more transferable adversarial speech examples. In the optimization iterations of the adversarial perturbation, we randomly smooth input waveforms to prevent the adversarial examples from overfitting white-box surrogate models. The proposed transformation can be combined with any iterative gradient-based attack method. Extensive experiments demonstrate that our method significantly enhances the transferability of waveform-level adversarial speech examples.\",\"PeriodicalId\":321830,\"journal\":{\"name\":\"2023 IEEE International Conference on Multimedia and Expo (ICME)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Multimedia and Expo (ICME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME55011.2023.00395\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME55011.2023.00395","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Transferable Waveform-level Adversarial Attack against Speech Anti-spoofing Models
Speech anti-spoofing models protect media from malicious fake speech but are vulnerable to adversarial attacks. Studies of adversarial attacks are conducive to developing robust speech anti-spoofing systems. Existing transfer-based attack methods mainly craft adversarial speech examples at the handcrafted-feature level, which have limited attack ability against the real-world anti-spoofing systems, as these systems only have raw waveform input interfaces. In this work, we propose a waveform-level input data transformation, called the temporal smoothing method, to generate more transferable adversarial speech examples. In the optimization iterations of the adversarial perturbation, we randomly smooth input waveforms to prevent the adversarial examples from overfitting white-box surrogate models. The proposed transformation can be combined with any iterative gradient-based attack method. Extensive experiments demonstrate that our method significantly enhances the transferability of waveform-level adversarial speech examples.