{"title":"D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack","authors":"Hong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen, Nhien-An Le-Khac","doi":"arxiv-2409.07390","DOIUrl":null,"url":null,"abstract":"The advancements in generative AI have enabled the improvement of audio\nsynthesis models, including text-to-speech and voice conversion. This raises\nconcerns about its potential misuse in social manipulation and political\ninterference, as synthetic speech has become indistinguishable from natural\nhuman speech. Several speech-generation programs are utilized for malicious\npurposes, especially impersonating individuals through phone calls. Therefore,\ndetecting fake audio is crucial to maintain social security and safeguard the\nintegrity of information. Recent research has proposed a D-CAPTCHA system based\non the challenge-response protocol to differentiate fake phone calls from real\nones. In this work, we study the resilience of this system and introduce a more\nrobust version, D-CAPTCHA++, to defend against fake calls. Specifically, we\nfirst expose the vulnerability of the D-CAPTCHA system under transferable\nimperceptible adversarial attack. Secondly, we mitigate such vulnerability by\nimproving the robustness of the system by using adversarial training in\nD-CAPTCHA deepfake detectors and task classifiers.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The advancements in generative AI have enabled the improvement of audio
synthesis models, including text-to-speech and voice conversion. This raises
concerns about its potential misuse in social manipulation and political
interference, as synthetic speech has become indistinguishable from natural
human speech. Several speech-generation programs are utilized for malicious
purposes, especially impersonating individuals through phone calls. Therefore,
detecting fake audio is crucial to maintain social security and safeguard the
integrity of information. Recent research has proposed a D-CAPTCHA system based
on the challenge-response protocol to differentiate fake phone calls from real
ones. In this work, we study the resilience of this system and introduce a more
robust version, D-CAPTCHA++, to defend against fake calls. Specifically, we
first expose the vulnerability of the D-CAPTCHA system under transferable
imperceptible adversarial attack. Secondly, we mitigate such vulnerability by
improving the robustness of the system by using adversarial training in
D-CAPTCHA deepfake detectors and task classifiers.