CommanderGabble:针对利用快速语音的ASR系统的通用攻击

Annual Computer Security Applications Conference Pub Date : 2021-12-06 DOI:10.1145/3485832.3485892

Zhaohe Zhang, Edwin Yang, Song Fang

{"title":"CommanderGabble:针对利用快速语音的ASR系统的通用攻击","authors":"Zhaohe Zhang, Edwin Yang, Song Fang","doi":"10.1145/3485832.3485892","DOIUrl":null,"url":null,"abstract":"Automatic Speech Recognition (ASR) systems are widely used in various online transcription services and personal digital assistants. Emerging lines of research have demonstrated that ASR systems are vulnerable to hidden voice commands, i.e., audio that can be recognized by ASRs but not by humans. Such attacks, however, often either highly depend on white-box knowledge of a specific machine learning model or require special hardware to construct the adversarial audio. This paper proposes a new model-agnostic and easily-constructed attack, called CommanderGabble, which uses fast speech to camouflage voice commands. Both humans and ASR systems often misinterpret fast speech, and such misinterpretation can be exploited to launch hidden voice command attacks. Specifically, by carefully manipulating the phonetic structure of a target voice command, ASRs can be caused to derive a hidden meaning from the manipulated, high-speed version. We implement the discovered attacks both over-the-wire and over-the-air, and conduct a suite of experiments to demonstrate their efficacy against 7 practical ASR systems. Our experimental results show that the over-the-wire attacks can disguise as many as 96 out of 100 tested voice commands into adversarial ones, and that the over-the-air attacks are consistently successful for all 18 chosen commands in multiple real-world scenarios.","PeriodicalId":175869,"journal":{"name":"Annual Computer Security Applications Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"CommanderGabble: A Universal Attack Against ASR Systems Leveraging Fast Speech\",\"authors\":\"Zhaohe Zhang, Edwin Yang, Song Fang\",\"doi\":\"10.1145/3485832.3485892\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic Speech Recognition (ASR) systems are widely used in various online transcription services and personal digital assistants. Emerging lines of research have demonstrated that ASR systems are vulnerable to hidden voice commands, i.e., audio that can be recognized by ASRs but not by humans. Such attacks, however, often either highly depend on white-box knowledge of a specific machine learning model or require special hardware to construct the adversarial audio. This paper proposes a new model-agnostic and easily-constructed attack, called CommanderGabble, which uses fast speech to camouflage voice commands. Both humans and ASR systems often misinterpret fast speech, and such misinterpretation can be exploited to launch hidden voice command attacks. Specifically, by carefully manipulating the phonetic structure of a target voice command, ASRs can be caused to derive a hidden meaning from the manipulated, high-speed version. We implement the discovered attacks both over-the-wire and over-the-air, and conduct a suite of experiments to demonstrate their efficacy against 7 practical ASR systems. Our experimental results show that the over-the-wire attacks can disguise as many as 96 out of 100 tested voice commands into adversarial ones, and that the over-the-air attacks are consistently successful for all 18 chosen commands in multiple real-world scenarios.\",\"PeriodicalId\":175869,\"journal\":{\"name\":\"Annual Computer Security Applications Conference\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Computer Security Applications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3485832.3485892\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Computer Security Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3485832.3485892","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

自动语音识别(ASR)系统广泛应用于各种在线转录服务和个人数字助理。新兴的研究表明，ASR系统容易受到隐藏的语音命令的攻击，即ASR可以识别但人类无法识别的音频。然而，这种攻击通常要么高度依赖于特定机器学习模型的白盒知识，要么需要特殊的硬件来构建对抗性音频。本文提出了一种新的模型不可知且易于构建的攻击方法，称为CommanderGabble，它利用快速语音来伪装语音命令。人类和ASR系统都经常误解快速语音，而这种误解可以被利用来发起隐藏的语音命令攻击。具体来说，通过小心地操纵目标语音命令的语音结构，可以使asr从被操纵的高速版本中获得隐藏的含义。我们通过有线和无线两种方式实现了发现的攻击，并进行了一套实验来证明它们对7个实际ASR系统的有效性。我们的实验结果表明，无线攻击可以将100个测试语音命令中的96个伪装成对抗性的语音命令，并且在多个真实场景中，无线攻击对于所有18个选定的命令都是一致成功的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CommanderGabble: A Universal Attack Against ASR Systems Leveraging Fast Speech

Automatic Speech Recognition (ASR) systems are widely used in various online transcription services and personal digital assistants. Emerging lines of research have demonstrated that ASR systems are vulnerable to hidden voice commands, i.e., audio that can be recognized by ASRs but not by humans. Such attacks, however, often either highly depend on white-box knowledge of a specific machine learning model or require special hardware to construct the adversarial audio. This paper proposes a new model-agnostic and easily-constructed attack, called CommanderGabble, which uses fast speech to camouflage voice commands. Both humans and ASR systems often misinterpret fast speech, and such misinterpretation can be exploited to launch hidden voice command attacks. Specifically, by carefully manipulating the phonetic structure of a target voice command, ASRs can be caused to derive a hidden meaning from the manipulated, high-speed version. We implement the discovered attacks both over-the-wire and over-the-air, and conduct a suite of experiments to demonstrate their efficacy against 7 practical ASR systems. Our experimental results show that the over-the-wire attacks can disguise as many as 96 out of 100 tested voice commands into adversarial ones, and that the over-the-air attacks are consistently successful for all 18 chosen commands in multiple real-world scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annual Computer Security Applications Conference

自引率

0.00%

发文量