基于对抗性鲁棒水印的双重防御语音合成攻击

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-04-21 DOI:10.1109/LSP.2025.3562817

Yulin He;Hongxia Wang;Yiqin Qiu;Hao Cao

{"title":"基于对抗性鲁棒水印的双重防御语音合成攻击","authors":"Yulin He;Hongxia Wang;Yiqin Qiu;Hao Cao","doi":"10.1109/LSP.2025.3562817","DOIUrl":null,"url":null,"abstract":"Given the widespread dissemination of digital audio and the advancements in speech synthesis technologies, protecting audio copyright has become a critical issue. Although watermarks play an important role in copyright verification and forensic analysis, they are insufficient to proactively defend against malicious speech synthesis. To address this issue, we introduce a novel adversarial speech synthesis watermarking mechanism (ASSMark), which simultaneously traces the audio copyright and disrupts the speech synthesis models by embedding robust adversarial watermarks in a one-time manner. Specifically, we design a unified training framework that models the embedding of watermarks and adversarial perturbations as collaborative tasks. This approach allows for the fine-tuning of any robust watermark into an adversarial watermark, resulting in watermarked audio that can effectively defend against unauthorized speech synthesis attacks. Experimental results demonstrate that ASSMark achieves over 90% protection rate even to unknown black-box models. Compared to simplistic two-step protection methods, it not only effectively resists synthesis attacks but also achieves superior watermark extraction accuracy and speech quality, offering an outstanding solution for protecting audio copyright.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1870-1874"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ASSMark: Dual Defense Against Speech Synthesis Attack via Adversarial Robust Watermarking\",\"authors\":\"Yulin He;Hongxia Wang;Yiqin Qiu;Hao Cao\",\"doi\":\"10.1109/LSP.2025.3562817\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given the widespread dissemination of digital audio and the advancements in speech synthesis technologies, protecting audio copyright has become a critical issue. Although watermarks play an important role in copyright verification and forensic analysis, they are insufficient to proactively defend against malicious speech synthesis. To address this issue, we introduce a novel adversarial speech synthesis watermarking mechanism (ASSMark), which simultaneously traces the audio copyright and disrupts the speech synthesis models by embedding robust adversarial watermarks in a one-time manner. Specifically, we design a unified training framework that models the embedding of watermarks and adversarial perturbations as collaborative tasks. This approach allows for the fine-tuning of any robust watermark into an adversarial watermark, resulting in watermarked audio that can effectively defend against unauthorized speech synthesis attacks. Experimental results demonstrate that ASSMark achieves over 90% protection rate even to unknown black-box models. Compared to simplistic two-step protection methods, it not only effectively resists synthesis attacks but also achieves superior watermark extraction accuracy and speech quality, offering an outstanding solution for protecting audio copyright.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"1870-1874\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10971213/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10971213/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

随着数字音频的广泛传播和语音合成技术的进步，音频版权保护已成为一个关键问题。尽管水印在版权验证和取证分析中发挥了重要作用，但它不足以主动防御恶意语音合成。为了解决这个问题，我们引入了一种新的对抗性语音合成水印机制（ASSMark），该机制通过一次性嵌入鲁棒的对抗性水印来同时跟踪音频版权并破坏语音合成模型。具体来说，我们设计了一个统一的训练框架，将水印和对抗性扰动的嵌入建模为协作任务。这种方法允许将任何鲁棒水印微调为对抗水印，从而产生可以有效防御未经授权的语音合成攻击的带水印音频。实验结果表明，即使对未知的黑盒模型，ASSMark也能达到90%以上的保护率。与简单的两步保护方法相比，该方法不仅能有效抵抗合成攻击，而且水印提取精度和语音质量都较好，为音频版权保护提供了一种出色的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ASSMark: Dual Defense Against Speech Synthesis Attack via Adversarial Robust Watermarking

Given the widespread dissemination of digital audio and the advancements in speech synthesis technologies, protecting audio copyright has become a critical issue. Although watermarks play an important role in copyright verification and forensic analysis, they are insufficient to proactively defend against malicious speech synthesis. To address this issue, we introduce a novel adversarial speech synthesis watermarking mechanism (ASSMark), which simultaneously traces the audio copyright and disrupts the speech synthesis models by embedding robust adversarial watermarks in a one-time manner. Specifically, we design a unified training framework that models the embedding of watermarks and adversarial perturbations as collaborative tasks. This approach allows for the fine-tuning of any robust watermark into an adversarial watermark, resulting in watermarked audio that can effectively defend against unauthorized speech synthesis attacks. Experimental results demonstrate that ASSMark achieves over 90% protection rate even to unknown black-box models. Compared to simplistic two-step protection methods, it not only effectively resists synthesis attacks but also achieves superior watermark extraction accuracy and speech quality, offering an outstanding solution for protecting audio copyright.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.