VoiceCloak: Adversarial Example Enabled Voice De-Identification with Balanced Privacy and Utility

Meng Chen, Liwang Lu, Junhao Wang, Jiadi Yu, Ying Chen, Zhibo Wang, Zhongjie Ba, Feng Lin, Kui Ren
{"title":"VoiceCloak: Adversarial Example Enabled Voice De-Identification with Balanced Privacy and Utility","authors":"Meng Chen, Liwang Lu, Junhao Wang, Jiadi Yu, Ying Chen, Zhibo Wang, Zhongjie Ba, Feng Lin, Kui Ren","doi":"10.1145/3596266","DOIUrl":null,"url":null,"abstract":"Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying the utility of voice services. Existing machine-centric studies employ direct modification or text-based re-synthesis to de-identify users’ voices but cause inconsistent audibility for human participants in emerging online communication scenarios, such as virtual meetings. In this paper, we propose a human-centric voice de-identification system, VoiceCloak , which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefiting from this, VoiceCloak could preserve user identity from exposure by Automatic Speaker Identification (ASI), while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, VoiceCloak learns a compact speaker distribution through a conditional variational auto-encoder to synthesize diverse targets on demand. Guided by these pseudo targets, VoiceCloak constructs adversarial examples in an input-specific manner, enabling any-to-any identity transformation for robust de-identification. Experimental results show that VoiceCloak could achieve over 92% and 84% successful de-identification on mainstream ASIs and commercial systems with excellent voiceprint consistency, speech integrity, and audio quality.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"77 1","pages":"48:1-48:21"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3596266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying the utility of voice services. Existing machine-centric studies employ direct modification or text-based re-synthesis to de-identify users’ voices but cause inconsistent audibility for human participants in emerging online communication scenarios, such as virtual meetings. In this paper, we propose a human-centric voice de-identification system, VoiceCloak , which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefiting from this, VoiceCloak could preserve user identity from exposure by Automatic Speaker Identification (ASI), while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, VoiceCloak learns a compact speaker distribution through a conditional variational auto-encoder to synthesize diverse targets on demand. Guided by these pseudo targets, VoiceCloak constructs adversarial examples in an input-specific manner, enabling any-to-any identity transformation for robust de-identification. Experimental results show that VoiceCloak could achieve over 92% and 84% successful de-identification on mainstream ASIs and commercial systems with excellent voiceprint consistency, speech integrity, and audio quality.
VoiceCloak:具有平衡隐私和效用的对抗性示例启用语音去识别
面对语音数据发布过程中身份泄露的威胁,用户在享受语音服务的效用时陷入了隐私-效用困境。现有的以机器为中心的研究采用直接修改或基于文本的重新合成来消除用户的声音,但在新兴的在线交流场景(如虚拟会议)中,人类参与者的可听性不一致。在本文中,我们提出了一个以人为中心的语音去识别系统,VoiceCloak,它使用对抗性示例来平衡语音服务的隐私性和实用性。我们设计了一个新的卷积对抗示例,将扰动调制到现实世界的房间脉冲响应中,而不是典型的可加性示例引起可感知的扭曲。得益于此,VoiceCloak可以通过自动说话人识别(ASI)保护用户身份,同时保留语音感知质量以进行非侵入性去识别。此外,VoiceCloak通过条件变分自编码器学习紧凑的扬声器分布,以根据需要合成不同的目标。在这些伪目标的指导下,VoiceCloak以特定于输入的方式构造对抗性示例,支持任意到任意的身份转换,以实现健壮的去标识化。实验结果表明,VoiceCloak在主流ASIs和商用系统上的去识别成功率分别超过92%和84%,具有良好的声纹一致性、语音完整性和音频质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信