{"title":"歌剧中的魅影:机器人对话系统的对抗性音乐攻击","authors":"Sheng Li, Jiyi Li, Yang Cao","doi":"10.3389/fcomp.2024.1355975","DOIUrl":null,"url":null,"abstract":"This study explores the vulnerability of robot dialogue systems' automatic speech recognition (ASR) module to adversarial music attacks. Specifically, we explore music as a natural camouflage for such attacks. We propose a novel method to hide ghost speech commands in a music clip by slightly perturbing its raw waveform. We apply our attack on an industry-popular ASR model, namely the time-delay neural network (TDNN), widely used for speech and speaker recognition. Our experiment demonstrates that adversarial music crafted by our attack can easily mislead industry-level TDNN models into picking up ghost commands with high success rates. However, it sounds no different from the original music to the human ear. This reveals a serious threat by adversarial music to robot dialogue systems, calling for effective defenses against such stealthy attacks.","PeriodicalId":52823,"journal":{"name":"Frontiers in Computer Science","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Phantom in the opera: adversarial music attack for robot dialogue system\",\"authors\":\"Sheng Li, Jiyi Li, Yang Cao\",\"doi\":\"10.3389/fcomp.2024.1355975\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study explores the vulnerability of robot dialogue systems' automatic speech recognition (ASR) module to adversarial music attacks. Specifically, we explore music as a natural camouflage for such attacks. We propose a novel method to hide ghost speech commands in a music clip by slightly perturbing its raw waveform. We apply our attack on an industry-popular ASR model, namely the time-delay neural network (TDNN), widely used for speech and speaker recognition. Our experiment demonstrates that adversarial music crafted by our attack can easily mislead industry-level TDNN models into picking up ghost commands with high success rates. However, it sounds no different from the original music to the human ear. This reveals a serious threat by adversarial music to robot dialogue systems, calling for effective defenses against such stealthy attacks.\",\"PeriodicalId\":52823,\"journal\":{\"name\":\"Frontiers in Computer Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fcomp.2024.1355975\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomp.2024.1355975","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
摘要
本研究探讨了机器人对话系统的自动语音识别(ASR)模块在对抗性音乐攻击面前的脆弱性。具体来说,我们探索音乐作为此类攻击的天然伪装。我们提出了一种新方法,通过轻微扰动音乐片段的原始波形来隐藏音乐片段中的幽灵语音命令。我们将攻击应用于业界流行的 ASR 模型,即广泛用于语音和说话人识别的时延神经网络 (TDNN)。我们的实验证明,我们的攻击所制作的对抗性音乐可以轻易地误导行业级 TDNN 模型,使其高成功率地捕捉到幽灵指令。然而,在人耳听来,它与原始音乐并无不同。这揭示了对抗性音乐对机器人对话系统的严重威胁,要求对这种隐蔽攻击进行有效防御。
Phantom in the opera: adversarial music attack for robot dialogue system
This study explores the vulnerability of robot dialogue systems' automatic speech recognition (ASR) module to adversarial music attacks. Specifically, we explore music as a natural camouflage for such attacks. We propose a novel method to hide ghost speech commands in a music clip by slightly perturbing its raw waveform. We apply our attack on an industry-popular ASR model, namely the time-delay neural network (TDNN), widely used for speech and speaker recognition. Our experiment demonstrates that adversarial music crafted by our attack can easily mislead industry-level TDNN models into picking up ghost commands with high success rates. However, it sounds no different from the original music to the human ear. This reveals a serious threat by adversarial music to robot dialogue systems, calling for effective defenses against such stealthy attacks.