A Benchmark for Multi-Speaker Anonymization

IF 8 1区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Xiaoxiao Miao;Ruijie Tao;Chang Zeng;Xin Wang
{"title":"A Benchmark for Multi-Speaker Anonymization","authors":"Xiaoxiao Miao;Ruijie Tao;Chang Zeng;Xin Wang","doi":"10.1109/TIFS.2025.3556345","DOIUrl":null,"url":null,"abstract":"Privacy-preserving voice protection approaches primarily suppress privacy-related information derived from paralinguistic attributes while preserving the linguistic content. Existing solutions focus particularly on single-speaker scenarios. However, they lack practicality for real-world applications, i.e., multi-speaker scenarios. In this paper, we present an initial attempt to provide a multi-speaker anonymization benchmark by defining the task and evaluation protocol, proposing benchmarking solutions, and discussing the privacy leakage of overlapping conversations. The proposed benchmark solutions are based on a cascaded system that integrates spectral-clustering-based speaker diarization and disentanglement-based speaker anonymization using a selection-based anonymizer. To improve utility, the benchmark solutions are further enhanced by two conversation-level speaker vector anonymization methods. The first method minimizes the differential similarity across speaker pairs in the original and anonymized conversations, which maintains original speaker relationships in the anonymized version. The other minimizes the aggregated similarity across anonymized speakers, which achieves better differentiation between speakers. Experiments conducted on both non-overlap simulated and real-world datasets demonstrate the effectiveness of the multi-speaker anonymization system with the proposed speaker anonymizers. Additionally, we analyzed overlapping speech regarding privacy leakage and provided potential solutions (Code and audio samples are available at <uri>https://github.com/xiaoxiaomiao323/MSA</uri>), evaluation datasets can be download from <uri>https://zenodo.org/records/14249171</uri>","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"3819-3833"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10945923","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10945923/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Privacy-preserving voice protection approaches primarily suppress privacy-related information derived from paralinguistic attributes while preserving the linguistic content. Existing solutions focus particularly on single-speaker scenarios. However, they lack practicality for real-world applications, i.e., multi-speaker scenarios. In this paper, we present an initial attempt to provide a multi-speaker anonymization benchmark by defining the task and evaluation protocol, proposing benchmarking solutions, and discussing the privacy leakage of overlapping conversations. The proposed benchmark solutions are based on a cascaded system that integrates spectral-clustering-based speaker diarization and disentanglement-based speaker anonymization using a selection-based anonymizer. To improve utility, the benchmark solutions are further enhanced by two conversation-level speaker vector anonymization methods. The first method minimizes the differential similarity across speaker pairs in the original and anonymized conversations, which maintains original speaker relationships in the anonymized version. The other minimizes the aggregated similarity across anonymized speakers, which achieves better differentiation between speakers. Experiments conducted on both non-overlap simulated and real-world datasets demonstrate the effectiveness of the multi-speaker anonymization system with the proposed speaker anonymizers. Additionally, we analyzed overlapping speech regarding privacy leakage and provided potential solutions (Code and audio samples are available at https://github.com/xiaoxiaomiao323/MSA), evaluation datasets can be download from https://zenodo.org/records/14249171
多说话人匿名化的基准测试
保护隐私的语音保护方法主要是在保留语言内容的同时,抑制从副语言属性派生的隐私相关信息。现有的解决方案特别侧重于单扬声器场景。然而,它们在实际应用中缺乏实用性,例如多扬声器场景。在本文中,我们通过定义任务和评估协议,提出基准测试解决方案,并讨论重叠对话的隐私泄露,提出了提供多说话人匿名基准测试的初步尝试。所提出的基准解决方案基于级联系统,该系统集成了基于频谱聚类的扬声器拨号和使用基于选择的匿名器的基于解纠缠的扬声器匿名化。为了提高实用性,基准解决方案通过两种会话级说话人向量匿名化方法进一步增强。第一种方法最小化原始和匿名对话中说话人对之间的差异相似度,在匿名版本中保持原始说话人关系。另一种方法最小化匿名说话者之间的聚合相似度,从而更好地区分说话者。在非重叠的模拟和真实数据集上进行的实验表明,使用所提出的说话人匿名器的多说话人匿名化系统是有效的。此外,我们分析了关于隐私泄露的重叠语音,并提供了潜在的解决方案(代码和音频样本可在https://github.com/xiaoxiaomiao323/MSA获得),评估数据集可从https://zenodo.org/records/14249171下载
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Information Forensics and Security
IEEE Transactions on Information Forensics and Security 工程技术-工程:电子与电气
CiteScore
14.40
自引率
7.40%
发文量
234
审稿时长
6.5 months
期刊介绍: The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信