你好,你在找我吗?:区分人类和电子扬声器的语音接口安全

Logan Blue, Luis Vargas, Patrick Traynor
{"title":"你好,你在找我吗?:区分人类和电子扬声器的语音接口安全","authors":"Logan Blue, Luis Vargas, Patrick Traynor","doi":"10.1145/3212480.3212505","DOIUrl":null,"url":null,"abstract":"Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces.","PeriodicalId":267134,"journal":{"name":"Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":"{\"title\":\"Hello, Is It Me You're Looking For?: Differentiating Between Human and Electronic Speakers for Voice Interface Security\",\"authors\":\"Logan Blue, Luis Vargas, Patrick Traynor\",\"doi\":\"10.1145/3212480.3212505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces.\",\"PeriodicalId\":267134,\"journal\":{\"name\":\"Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"50\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3212480.3212505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3212480.3212505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 50

摘要

语音接口越来越多地集成到各种物联网(IoT)设备中。这样的系统可以极大地简化用户与显示器有限的设备之间的交互。不幸的是,语音接口也为开发创造了新的机会。具体来说,在实现语音接口的系统范围内的任何发出声音的设备(例如,智能电视,连接互联网的设备等)都可能导致这些系统执行违背其所有者意愿的操作(例如,解锁门,进行未经授权的购买等)。我们通过开发一种技术来识别人类和电子扬声器产生的音频的根本差异来解决这个问题。我们确定了次低音过度激发,或者在人类声音范围之外但固有于现代扬声器设计的显著低频信号的存在,作为这两种来源之间的强大区分。在确定了这一现象之后,我们展示了它在安静环境中以100%/1.72%的TPR/FPR防止对抗性请求、重放音频和隐藏命令的用途。通过这样做,我们证明了通过附近音频设备注入的命令可以通过语音接口有效地删除。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hello, Is It Me You're Looking For?: Differentiating Between Human and Electronic Speakers for Voice Interface Security
Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信