Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications

Proceedings 2019 Network and Distributed System Security Symposium Pub Date : 2019-01-01 DOI:10.14722/ndss.2019.23525

Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, G. Gu

{"title":"Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications","authors":"Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, G. Gu","doi":"10.14722/ndss.2019.23525","DOIUrl":null,"url":null,"abstract":"—Popular Voice Assistant (VA) services such as Amazon Alexa and Google Assistant are now rapidly appifying their platforms to allow more ﬂexible and diverse voice-controlled service experience. However, the ubiquitous deployment of VA devices and the increasing number of third-party applications have raised security and privacy concerns. While previous works such as hidden voice attacks mostly examine the problems of VA services’ default Automatic Speech Recognition (ASR) component, our work analyzes and evaluates the security of the succeeding component after ASR, i.e., Natural Language Understanding (NLU), which performs semantic interpretation (i.e., text-to-intent) after ASR’s acoustic-to-text processing. In particular, we focus on NLU’s Intent Classiﬁer which is used in customizing machine understanding for third-party VA Applications (or vApps). We ﬁnd that the semantic inconsistency caused by the improper semantic interpretation of an Intent Classiﬁer can create the opportunity of breaching the integrity of vApp processing when attackers delicately leverage some common spoken errors. In this paper, we design the ﬁrst linguistic-model-guided fuzzing tool, named LipFuzzer, to assess the security of Intent Classiﬁer and systematically discover potential misinterpretation-prone spoken errors based on vApps’ voice command templates. To guide the fuzzing, we construct adversarial linguistic models with the help of Statistical Relational Learning (SRL) and emerging Natural Language Processing (NLP) techniques. In evaluation, we have successfully veriﬁed the effectiveness and accuracy of LipFuzzer. We also use LipFuzzer to evaluate both Amazon Alexa and Google Assistant vApp platforms. We have identiﬁed that a large portion of real-world","PeriodicalId":20444,"journal":{"name":"Proceedings 2019 Network and Distributed System Security Symposium","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2019 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2019.23525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 44

Abstract

—Popular Voice Assistant (VA) services such as Amazon Alexa and Google Assistant are now rapidly appifying their platforms to allow more ﬂexible and diverse voice-controlled service experience. However, the ubiquitous deployment of VA devices and the increasing number of third-party applications have raised security and privacy concerns. While previous works such as hidden voice attacks mostly examine the problems of VA services’ default Automatic Speech Recognition (ASR) component, our work analyzes and evaluates the security of the succeeding component after ASR, i.e., Natural Language Understanding (NLU), which performs semantic interpretation (i.e., text-to-intent) after ASR’s acoustic-to-text processing. In particular, we focus on NLU’s Intent Classiﬁer which is used in customizing machine understanding for third-party VA Applications (or vApps). We ﬁnd that the semantic inconsistency caused by the improper semantic interpretation of an Intent Classiﬁer can create the opportunity of breaching the integrity of vApp processing when attackers delicately leverage some common spoken errors. In this paper, we design the ﬁrst linguistic-model-guided fuzzing tool, named LipFuzzer, to assess the security of Intent Classiﬁer and systematically discover potential misinterpretation-prone spoken errors based on vApps’ voice command templates. To guide the fuzzing, we construct adversarial linguistic models with the help of Statistical Relational Learning (SRL) and emerging Natural Language Processing (NLP) techniques. In evaluation, we have successfully veriﬁed the effectiveness and accuracy of LipFuzzer. We also use LipFuzzer to evaluate both Amazon Alexa and Google Assistant vApp platforms. We have identiﬁed that a large portion of real-world

查看原文本刊更多论文

语音识别后的生活:语音助理应用的模糊语义误解

-亚马逊Alexa和b谷歌Assistant等流行语音助手(VA)服务正在迅速应用其平台，以实现更灵活和多样化的语音控制服务体验。然而，无处不在的VA设备的部署和越来越多的第三方应用程序引起了人们对安全和隐私的担忧。虽然以前的工作(如隐藏语音攻击)主要检查VA服务默认的自动语音识别(ASR)组件的问题，但我们的工作分析和评估了ASR之后后续组件的安全性，即自然语言理解(NLU)，它在ASR的声到文本处理之后执行语义解释(即文本到意图)。我们特别关注NLU的意图分类器，该分类器用于为第三方VA应用程序(或vApps)定制机器理解。我们发现，当攻击者巧妙地利用一些常见的口语错误时，由意图分类器的不正确语义解释引起的语义不一致可能会破坏vApp处理的完整性。在本文中，我们设计了第一个语言模型指导的模糊测试工具LipFuzzer，用于评估意图分类器的安全性，并基于vapp的语音命令模板系统地发现潜在的容易被误解的语音错误。为了指导模糊，我们在统计关系学习(SRL)和新兴的自然语言处理(NLP)技术的帮助下构建了对抗性语言模型。在评估中，我们成功地验证了LipFuzzer的有效性和准确性。我们还使用LipFuzzer来评估亚马逊Alexa和b谷歌助手vApp平台。我们已经确定了现实世界的很大一部分

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 2019 Network and Distributed System Security Symposium

自引率

0.00%

发文量