WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification

Junzuo Zhou, Jiangyan Yi, Yong Ren, Jianhua Tao, Tao Wang, Chu Yuan Zhang
{"title":"WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification","authors":"Junzuo Zhou, Jiangyan Yi, Yong Ren, Jianhua Tao, Tao Wang, Chu Yuan Zhang","doi":"arxiv-2409.12121","DOIUrl":null,"url":null,"abstract":"Recent advances in speech spoofing necessitate stronger verification\nmechanisms in neural speech codecs to ensure authenticity. Current methods\nembed numerical watermarks before compression and extract them from\nreconstructed speech for verification, but face limitations such as separate\ntraining processes for the watermark and codec, and insufficient cross-modal\ninformation integration, leading to reduced watermark imperceptibility,\nextraction accuracy, and capacity. To address these issues, we propose WMCodec,\nthe first neural speech codec to jointly train compression-reconstruction and\nwatermark embedding-extraction in an end-to-end manner, optimizing both\nimperceptibility and extractability of the watermark. Furthermore, We design an\niterative Attention Imprint Unit (AIU) for deeper feature integration of\nwatermark and speech, reducing the impact of quantization noise on the\nwatermark. Experimental results show WMCodec outperforms AudioSeal with Encodec\nin most quality metrics for watermark imperceptibility and consistently exceeds\nboth AudioSeal with Encodec and reinforced TraceableSpeech in extraction\naccuracy of watermark. At bandwidth of 6 kbps with a watermark capacity of 16\nbps, WMCodec maintains over 99% extraction accuracy under common attacks,\ndemonstrating strong robustness.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advances in speech spoofing necessitate stronger verification mechanisms in neural speech codecs to ensure authenticity. Current methods embed numerical watermarks before compression and extract them from reconstructed speech for verification, but face limitations such as separate training processes for the watermark and codec, and insufficient cross-modal information integration, leading to reduced watermark imperceptibility, extraction accuracy, and capacity. To address these issues, we propose WMCodec, the first neural speech codec to jointly train compression-reconstruction and watermark embedding-extraction in an end-to-end manner, optimizing both imperceptibility and extractability of the watermark. Furthermore, We design an iterative Attention Imprint Unit (AIU) for deeper feature integration of watermark and speech, reducing the impact of quantization noise on the watermark. Experimental results show WMCodec outperforms AudioSeal with Encodec in most quality metrics for watermark imperceptibility and consistently exceeds both AudioSeal with Encodec and reinforced TraceableSpeech in extraction accuracy of watermark. At bandwidth of 6 kbps with a watermark capacity of 16 bps, WMCodec maintains over 99% extraction accuracy under common attacks, demonstrating strong robustness.
WMCodec:带有深度水印的端到端神经语音编解码器,用于真实性验证
语音欺骗技术的最新进展要求神经语音编解码器采用更强大的验证机制来确保真实性。目前的方法是在压缩前嵌入数字水印,并从重组后的语音中提取水印进行验证,但这种方法面临着水印和编解码器训练过程分离、跨模态信息整合不足等限制,导致水印的不可感知性、提取精度和容量降低。为了解决这些问题,我们提出了 WMCodec,它是第一个以端到端方式联合训练压缩-重构和水印嵌入-提取的神经语音编解码器,同时优化了水印的可感知性和可提取性。此外,我们还设计了一种迭代注意力印记单元(AIU),用于更深入地整合水印和语音的特征,从而降低量化噪声对水印的影响。实验结果表明,WMCodec 在水印不可感知性的大多数质量指标上都优于 AudioSeal with Encodec,并且在水印提取准确性上一直超过 AudioSeal with Encodec 和强化可追踪语音。在带宽为 6 kbps、水印容量为 16bps 的情况下,WMCodec 在常见攻击下的提取准确率保持在 99% 以上,显示了强大的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信