你好，你在找我吗?:区分人类和电子扬声器的语音接口安全

Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks Pub Date : 2018-06-18 DOI:10.1145/3212480.3212505

Logan Blue, Luis Vargas, Patrick Traynor

{"title":"你好，你在找我吗?:区分人类和电子扬声器的语音接口安全","authors":"Logan Blue, Luis Vargas, Patrick Traynor","doi":"10.1145/3212480.3212505","DOIUrl":null,"url":null,"abstract":"Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces.","PeriodicalId":267134,"journal":{"name":"Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":"{\"title\":\"Hello, Is It Me You're Looking For?: Differentiating Between Human and Electronic Speakers for Voice Interface Security\",\"authors\":\"Logan Blue, Luis Vargas, Patrick Traynor\",\"doi\":\"10.1145/3212480.3212505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces.\",\"PeriodicalId\":267134,\"journal\":{\"name\":\"Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"50\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3212480.3212505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3212480.3212505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 50

摘要

语音接口越来越多地集成到各种物联网(IoT)设备中。这样的系统可以极大地简化用户与显示器有限的设备之间的交互。不幸的是，语音接口也为开发创造了新的机会。具体来说，在实现语音接口的系统范围内的任何发出声音的设备(例如，智能电视，连接互联网的设备等)都可能导致这些系统执行违背其所有者意愿的操作(例如，解锁门，进行未经授权的购买等)。我们通过开发一种技术来识别人类和电子扬声器产生的音频的根本差异来解决这个问题。我们确定了次低音过度激发，或者在人类声音范围之外但固有于现代扬声器设计的显著低频信号的存在，作为这两种来源之间的强大区分。在确定了这一现象之后，我们展示了它在安静环境中以100%/1.72%的TPR/FPR防止对抗性请求、重放音频和隐藏命令的用途。通过这样做，我们证明了通过附近音频设备注入的命令可以通过语音接口有效地删除。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hello, Is It Me You're Looking For?: Differentiating Between Human and Electronic Speakers for Voice Interface Security

Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks

自引率

0.00%

发文量