Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications

Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, G. Gu
{"title":"Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications","authors":"Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, G. Gu","doi":"10.14722/ndss.2019.23525","DOIUrl":null,"url":null,"abstract":"—Popular Voice Assistant (VA) services such as Amazon Alexa and Google Assistant are now rapidly appifying their platforms to allow more flexible and diverse voice-controlled service experience. However, the ubiquitous deployment of VA devices and the increasing number of third-party applications have raised security and privacy concerns. While previous works such as hidden voice attacks mostly examine the problems of VA services’ default Automatic Speech Recognition (ASR) component, our work analyzes and evaluates the security of the succeeding component after ASR, i.e., Natural Language Understanding (NLU), which performs semantic interpretation (i.e., text-to-intent) after ASR’s acoustic-to-text processing. In particular, we focus on NLU’s Intent Classifier which is used in customizing machine understanding for third-party VA Applications (or vApps). We find that the semantic inconsistency caused by the improper semantic interpretation of an Intent Classifier can create the opportunity of breaching the integrity of vApp processing when attackers delicately leverage some common spoken errors. In this paper, we design the first linguistic-model-guided fuzzing tool, named LipFuzzer, to assess the security of Intent Classifier and systematically discover potential misinterpretation-prone spoken errors based on vApps’ voice command templates. To guide the fuzzing, we construct adversarial linguistic models with the help of Statistical Relational Learning (SRL) and emerging Natural Language Processing (NLP) techniques. In evaluation, we have successfully verified the effectiveness and accuracy of LipFuzzer. We also use LipFuzzer to evaluate both Amazon Alexa and Google Assistant vApp platforms. We have identified that a large portion of real-world","PeriodicalId":20444,"journal":{"name":"Proceedings 2019 Network and Distributed System Security Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2019 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2019.23525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44

Abstract

—Popular Voice Assistant (VA) services such as Amazon Alexa and Google Assistant are now rapidly appifying their platforms to allow more flexible and diverse voice-controlled service experience. However, the ubiquitous deployment of VA devices and the increasing number of third-party applications have raised security and privacy concerns. While previous works such as hidden voice attacks mostly examine the problems of VA services’ default Automatic Speech Recognition (ASR) component, our work analyzes and evaluates the security of the succeeding component after ASR, i.e., Natural Language Understanding (NLU), which performs semantic interpretation (i.e., text-to-intent) after ASR’s acoustic-to-text processing. In particular, we focus on NLU’s Intent Classifier which is used in customizing machine understanding for third-party VA Applications (or vApps). We find that the semantic inconsistency caused by the improper semantic interpretation of an Intent Classifier can create the opportunity of breaching the integrity of vApp processing when attackers delicately leverage some common spoken errors. In this paper, we design the first linguistic-model-guided fuzzing tool, named LipFuzzer, to assess the security of Intent Classifier and systematically discover potential misinterpretation-prone spoken errors based on vApps’ voice command templates. To guide the fuzzing, we construct adversarial linguistic models with the help of Statistical Relational Learning (SRL) and emerging Natural Language Processing (NLP) techniques. In evaluation, we have successfully verified the effectiveness and accuracy of LipFuzzer. We also use LipFuzzer to evaluate both Amazon Alexa and Google Assistant vApp platforms. We have identified that a large portion of real-world
语音识别后的生活:语音助理应用的模糊语义误解
-亚马逊Alexa和b谷歌Assistant等流行语音助手(VA)服务正在迅速应用其平台,以实现更灵活和多样化的语音控制服务体验。然而,无处不在的VA设备的部署和越来越多的第三方应用程序引起了人们对安全和隐私的担忧。虽然以前的工作(如隐藏语音攻击)主要检查VA服务默认的自动语音识别(ASR)组件的问题,但我们的工作分析和评估了ASR之后后续组件的安全性,即自然语言理解(NLU),它在ASR的声到文本处理之后执行语义解释(即文本到意图)。我们特别关注NLU的意图分类器,该分类器用于为第三方VA应用程序(或vApps)定制机器理解。我们发现,当攻击者巧妙地利用一些常见的口语错误时,由意图分类器的不正确语义解释引起的语义不一致可能会破坏vApp处理的完整性。在本文中,我们设计了第一个语言模型指导的模糊测试工具LipFuzzer,用于评估意图分类器的安全性,并基于vapp的语音命令模板系统地发现潜在的容易被误解的语音错误。为了指导模糊,我们在统计关系学习(SRL)和新兴的自然语言处理(NLP)技术的帮助下构建了对抗性语言模型。在评估中,我们成功地验证了LipFuzzer的有效性和准确性。我们还使用LipFuzzer来评估亚马逊Alexa和b谷歌助手vApp平台。我们已经确定了现实世界的很大一部分
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信