VoiceCloak:具有平衡隐私和效用的对抗性示例启用语音去识别

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. Pub Date : 2023-01-01 DOI:10.1145/3596266

Meng Chen, Liwang Lu, Junhao Wang, Jiadi Yu, Ying Chen, Zhibo Wang, Zhongjie Ba, Feng Lin, Kui Ren

{"title":"VoiceCloak:具有平衡隐私和效用的对抗性示例启用语音去识别","authors":"Meng Chen, Liwang Lu, Junhao Wang, Jiadi Yu, Ying Chen, Zhibo Wang, Zhongjie Ba, Feng Lin, Kui Ren","doi":"10.1145/3596266","DOIUrl":null,"url":null,"abstract":"Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying the utility of voice services. Existing machine-centric studies employ direct modification or text-based re-synthesis to de-identify users’ voices but cause inconsistent audibility for human participants in emerging online communication scenarios, such as virtual meetings. In this paper, we propose a human-centric voice de-identification system, VoiceCloak , which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefiting from this, VoiceCloak could preserve user identity from exposure by Automatic Speaker Identification (ASI), while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, VoiceCloak learns a compact speaker distribution through a conditional variational auto-encoder to synthesize diverse targets on demand. Guided by these pseudo targets, VoiceCloak constructs adversarial examples in an input-specific manner, enabling any-to-any identity transformation for robust de-identification. Experimental results show that VoiceCloak could achieve over 92% and 84% successful de-identification on mainstream ASIs and commercial systems with excellent voiceprint consistency, speech integrity, and audio quality.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"77 1","pages":"48:1-48:21"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"VoiceCloak: Adversarial Example Enabled Voice De-Identification with Balanced Privacy and Utility\",\"authors\":\"Meng Chen, Liwang Lu, Junhao Wang, Jiadi Yu, Ying Chen, Zhibo Wang, Zhongjie Ba, Feng Lin, Kui Ren\",\"doi\":\"10.1145/3596266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying the utility of voice services. Existing machine-centric studies employ direct modification or text-based re-synthesis to de-identify users’ voices but cause inconsistent audibility for human participants in emerging online communication scenarios, such as virtual meetings. In this paper, we propose a human-centric voice de-identification system, VoiceCloak , which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefiting from this, VoiceCloak could preserve user identity from exposure by Automatic Speaker Identification (ASI), while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, VoiceCloak learns a compact speaker distribution through a conditional variational auto-encoder to synthesize diverse targets on demand. Guided by these pseudo targets, VoiceCloak constructs adversarial examples in an input-specific manner, enabling any-to-any identity transformation for robust de-identification. Experimental results show that VoiceCloak could achieve over 92% and 84% successful de-identification on mainstream ASIs and commercial systems with excellent voiceprint consistency, speech integrity, and audio quality.\",\"PeriodicalId\":20463,\"journal\":{\"name\":\"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.\",\"volume\":\"77 1\",\"pages\":\"48:1-48:21\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3596266\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3596266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

面对语音数据发布过程中身份泄露的威胁，用户在享受语音服务的效用时陷入了隐私-效用困境。现有的以机器为中心的研究采用直接修改或基于文本的重新合成来消除用户的声音，但在新兴的在线交流场景(如虚拟会议)中，人类参与者的可听性不一致。在本文中，我们提出了一个以人为中心的语音去识别系统，VoiceCloak，它使用对抗性示例来平衡语音服务的隐私性和实用性。我们设计了一个新的卷积对抗示例，将扰动调制到现实世界的房间脉冲响应中，而不是典型的可加性示例引起可感知的扭曲。得益于此，VoiceCloak可以通过自动说话人识别(ASI)保护用户身份，同时保留语音感知质量以进行非侵入性去识别。此外，VoiceCloak通过条件变分自编码器学习紧凑的扬声器分布，以根据需要合成不同的目标。在这些伪目标的指导下，VoiceCloak以特定于输入的方式构造对抗性示例，支持任意到任意的身份转换，以实现健壮的去标识化。实验结果表明，VoiceCloak在主流ASIs和商用系统上的去识别成功率分别超过92%和84%，具有良好的声纹一致性、语音完整性和音频质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VoiceCloak: Adversarial Example Enabled Voice De-Identification with Balanced Privacy and Utility

Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying the utility of voice services. Existing machine-centric studies employ direct modification or text-based re-synthesis to de-identify users’ voices but cause inconsistent audibility for human participants in emerging online communication scenarios, such as virtual meetings. In this paper, we propose a human-centric voice de-identification system, VoiceCloak , which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefiting from this, VoiceCloak could preserve user identity from exposure by Automatic Speaker Identification (ASI), while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, VoiceCloak learns a compact speaker distribution through a conditional variational auto-encoder to synthesize diverse targets on demand. Guided by these pseudo targets, VoiceCloak constructs adversarial examples in an input-specific manner, enabling any-to-any identity transformation for robust de-identification. Experimental results show that VoiceCloak could achieve over 92% and 84% successful de-identification on mainstream ASIs and commercial systems with excellent voiceprint consistency, speech integrity, and audio quality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

自引率

0.00%

发文量