A Benchmark for Multi-Speaker Anonymization

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-04-01 DOI:10.1109/TIFS.2025.3556345

Xiaoxiao Miao;Ruijie Tao;Chang Zeng;Xin Wang

{"title":"A Benchmark for Multi-Speaker Anonymization","authors":"Xiaoxiao Miao;Ruijie Tao;Chang Zeng;Xin Wang","doi":"10.1109/TIFS.2025.3556345","DOIUrl":null,"url":null,"abstract":"Privacy-preserving voice protection approaches primarily suppress privacy-related information derived from paralinguistic attributes while preserving the linguistic content. Existing solutions focus particularly on single-speaker scenarios. However, they lack practicality for real-world applications, i.e., multi-speaker scenarios. In this paper, we present an initial attempt to provide a multi-speaker anonymization benchmark by defining the task and evaluation protocol, proposing benchmarking solutions, and discussing the privacy leakage of overlapping conversations. The proposed benchmark solutions are based on a cascaded system that integrates spectral-clustering-based speaker diarization and disentanglement-based speaker anonymization using a selection-based anonymizer. To improve utility, the benchmark solutions are further enhanced by two conversation-level speaker vector anonymization methods. The first method minimizes the differential similarity across speaker pairs in the original and anonymized conversations, which maintains original speaker relationships in the anonymized version. The other minimizes the aggregated similarity across anonymized speakers, which achieves better differentiation between speakers. Experiments conducted on both non-overlap simulated and real-world datasets demonstrate the effectiveness of the multi-speaker anonymization system with the proposed speaker anonymizers. Additionally, we analyzed overlapping speech regarding privacy leakage and provided potential solutions (Code and audio samples are available at <uri>https://github.com/xiaoxiaomiao323/MSA</uri>), evaluation datasets can be download from <uri>https://zenodo.org/records/14249171</uri>","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"3819-3833"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10945923","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10945923/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Privacy-preserving voice protection approaches primarily suppress privacy-related information derived from paralinguistic attributes while preserving the linguistic content. Existing solutions focus particularly on single-speaker scenarios. However, they lack practicality for real-world applications, i.e., multi-speaker scenarios. In this paper, we present an initial attempt to provide a multi-speaker anonymization benchmark by defining the task and evaluation protocol, proposing benchmarking solutions, and discussing the privacy leakage of overlapping conversations. The proposed benchmark solutions are based on a cascaded system that integrates spectral-clustering-based speaker diarization and disentanglement-based speaker anonymization using a selection-based anonymizer. To improve utility, the benchmark solutions are further enhanced by two conversation-level speaker vector anonymization methods. The first method minimizes the differential similarity across speaker pairs in the original and anonymized conversations, which maintains original speaker relationships in the anonymized version. The other minimizes the aggregated similarity across anonymized speakers, which achieves better differentiation between speakers. Experiments conducted on both non-overlap simulated and real-world datasets demonstrate the effectiveness of the multi-speaker anonymization system with the proposed speaker anonymizers. Additionally, we analyzed overlapping speech regarding privacy leakage and provided potential solutions (Code and audio samples are available at https://github.com/xiaoxiaomiao323/MSA), evaluation datasets can be download from https://zenodo.org/records/14249171

查看原文本刊更多论文

多说话人匿名化的基准测试

保护隐私的语音保护方法主要是在保留语言内容的同时，抑制从副语言属性派生的隐私相关信息。现有的解决方案特别侧重于单扬声器场景。然而，它们在实际应用中缺乏实用性，例如多扬声器场景。在本文中，我们通过定义任务和评估协议，提出基准测试解决方案，并讨论重叠对话的隐私泄露，提出了提供多说话人匿名基准测试的初步尝试。所提出的基准解决方案基于级联系统，该系统集成了基于频谱聚类的扬声器拨号和使用基于选择的匿名器的基于解纠缠的扬声器匿名化。为了提高实用性，基准解决方案通过两种会话级说话人向量匿名化方法进一步增强。第一种方法最小化原始和匿名对话中说话人对之间的差异相似度，在匿名版本中保持原始说话人关系。另一种方法最小化匿名说话者之间的聚合相似度，从而更好地区分说话者。在非重叠的模拟和真实数据集上进行的实验表明，使用所提出的说话人匿名器的多说话人匿名化系统是有效的。此外，我们分析了关于隐私泄露的重叠语音，并提供了潜在的解决方案（代码和音频样本可在https://github.com/xiaoxiaomiao323/MSA获得），评估数据集可从https://zenodo.org/records/14249171下载

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features