Guoming Zhang;Xiaohui Ma;Huiting Zhang;Riccardo Spolaor;Yanni Yang;Xiaoyu Ji;Xiuzhen Cheng;Pengfei Hu
{"title":"UltraAdv:一种针对闭盒语音识别系统的超声波对抗性攻击","authors":"Guoming Zhang;Xiaohui Ma;Huiting Zhang;Riccardo Spolaor;Yanni Yang;Xiaoyu Ji;Xiuzhen Cheng;Pengfei Hu","doi":"10.1109/TMC.2025.3555680","DOIUrl":null,"url":null,"abstract":"Attacks on speech recognition systems often use adversarial or inaudible commands. However, a challenge is that adversarial perturbations typically fall within the audible frequency range, making it difficult to achieve inaudibility. Additionally, the non-linear effects of loudspeakers often cause inaudible commands to become audible at higher power levels. Therefore, minimizing the power requirements of the attack is essential to maintain inaudibility. Another significant obstacle is the conversion of variable-length commands, especially longer ones, into shorter target commands. In this paper, we present UltraAdv, a method for generating long-range adversarial perturbations capable of compromising commands of arbitrary length in closed-box setting. By combining the ultrasonic signal with the normal one, rather than negating it as in DolphinAttack, we significantly improve the energy efficiency, thus enhancing its attack distance. We also propose a dynamically adjustable suppression-interference method based on automatic gain control to address the challenge of mismatched durations between long commands and target commands (length-independent). Experiments demonstrate that using a single perturbation, we achieve impressive success rates of 98.84% and 96.62% and 98.32% across a diverse set of 12,260 speeches on DeepSpeech, iFlytek, and Whisper. The attack range reaches up to 15 m, surpassing DolphinAttack's 5 m range at equivalent power.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 8","pages":"7648-7662"},"PeriodicalIF":9.2000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UltraAdv: An Ultrasonic Adversarial Attack on Closed-Box Speech Recognition Systems\",\"authors\":\"Guoming Zhang;Xiaohui Ma;Huiting Zhang;Riccardo Spolaor;Yanni Yang;Xiaoyu Ji;Xiuzhen Cheng;Pengfei Hu\",\"doi\":\"10.1109/TMC.2025.3555680\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Attacks on speech recognition systems often use adversarial or inaudible commands. However, a challenge is that adversarial perturbations typically fall within the audible frequency range, making it difficult to achieve inaudibility. Additionally, the non-linear effects of loudspeakers often cause inaudible commands to become audible at higher power levels. Therefore, minimizing the power requirements of the attack is essential to maintain inaudibility. Another significant obstacle is the conversion of variable-length commands, especially longer ones, into shorter target commands. In this paper, we present UltraAdv, a method for generating long-range adversarial perturbations capable of compromising commands of arbitrary length in closed-box setting. By combining the ultrasonic signal with the normal one, rather than negating it as in DolphinAttack, we significantly improve the energy efficiency, thus enhancing its attack distance. We also propose a dynamically adjustable suppression-interference method based on automatic gain control to address the challenge of mismatched durations between long commands and target commands (length-independent). Experiments demonstrate that using a single perturbation, we achieve impressive success rates of 98.84% and 96.62% and 98.32% across a diverse set of 12,260 speeches on DeepSpeech, iFlytek, and Whisper. The attack range reaches up to 15 m, surpassing DolphinAttack's 5 m range at equivalent power.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"24 8\",\"pages\":\"7648-7662\"},\"PeriodicalIF\":9.2000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10946237/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10946237/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
UltraAdv: An Ultrasonic Adversarial Attack on Closed-Box Speech Recognition Systems
Attacks on speech recognition systems often use adversarial or inaudible commands. However, a challenge is that adversarial perturbations typically fall within the audible frequency range, making it difficult to achieve inaudibility. Additionally, the non-linear effects of loudspeakers often cause inaudible commands to become audible at higher power levels. Therefore, minimizing the power requirements of the attack is essential to maintain inaudibility. Another significant obstacle is the conversion of variable-length commands, especially longer ones, into shorter target commands. In this paper, we present UltraAdv, a method for generating long-range adversarial perturbations capable of compromising commands of arbitrary length in closed-box setting. By combining the ultrasonic signal with the normal one, rather than negating it as in DolphinAttack, we significantly improve the energy efficiency, thus enhancing its attack distance. We also propose a dynamically adjustable suppression-interference method based on automatic gain control to address the challenge of mismatched durations between long commands and target commands (length-independent). Experiments demonstrate that using a single perturbation, we achieve impressive success rates of 98.84% and 96.62% and 98.32% across a diverse set of 12,260 speeches on DeepSpeech, iFlytek, and Whisper. The attack range reaches up to 15 m, surpassing DolphinAttack's 5 m range at equivalent power.
期刊介绍:
IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.