Awais Khan, Khalid Mahmood Malik, James Ryan, Mikul Saravanan
{"title":"Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures","authors":"Awais Khan, Khalid Mahmood Malik, James Ryan, Mikul Saravanan","doi":"10.1007/s10462-023-10539-8","DOIUrl":null,"url":null,"abstract":"<div><p>With the advent of automated speaker verification (ASV) systems comes an equal and opposite development: malicious actors may seek to use voice spoofing attacks to fool those same systems. Various counter measures have been proposed to detect these spoofing attacks, but current offerings in this arena fall short of a unified and generalized approach applicable in real-world scenarios. For this reason, defensive measures for ASV systems produced in the last 6-7 years need to be classified, and qualitative and quantitative comparisons of state-of-the-art (SOTA) counter measures should be performed to assess the effectiveness of these systems against real-world attacks. Hence, in this work, we conduct a review of the literature on spoofing detection using hand-crafted features, deep learning, and end-to-end spoofing countermeasure solutions to detect logical access attacks, such as speech synthesis and voice conversion, and physical access attacks, i.e., replay attacks. Additionally, we review integrated and unified solutions to voice spoofing evaluation and speaker verification, and adversarial and anti-forensic attacks on both voice counter measures and ASV systems. In an extensive experimental analysis, the limitations and challenges of existing spoofing counter measures are presented, the performance of these counter measures on several datasets is reported, and cross-corpus evaluations are performed, something that is nearly absent in the existing literature, in order to assess the generalizability of existing solutions. For the experiments, we employ the ASVspoof2019, ASVspoof2021, and VSDC datasets along with GMM, SVM, CNN, and CNN-GRU classifiers. For reproducibility of the results, the code of the testbed can be found at our GitHub Repository (https://github.com/smileslab/Comparative-Analysis-Voice-Spoofing).</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"56 1","pages":"513 - 566"},"PeriodicalIF":10.7000,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-023-10539-8.pdf","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-023-10539-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 4
Abstract
With the advent of automated speaker verification (ASV) systems comes an equal and opposite development: malicious actors may seek to use voice spoofing attacks to fool those same systems. Various counter measures have been proposed to detect these spoofing attacks, but current offerings in this arena fall short of a unified and generalized approach applicable in real-world scenarios. For this reason, defensive measures for ASV systems produced in the last 6-7 years need to be classified, and qualitative and quantitative comparisons of state-of-the-art (SOTA) counter measures should be performed to assess the effectiveness of these systems against real-world attacks. Hence, in this work, we conduct a review of the literature on spoofing detection using hand-crafted features, deep learning, and end-to-end spoofing countermeasure solutions to detect logical access attacks, such as speech synthesis and voice conversion, and physical access attacks, i.e., replay attacks. Additionally, we review integrated and unified solutions to voice spoofing evaluation and speaker verification, and adversarial and anti-forensic attacks on both voice counter measures and ASV systems. In an extensive experimental analysis, the limitations and challenges of existing spoofing counter measures are presented, the performance of these counter measures on several datasets is reported, and cross-corpus evaluations are performed, something that is nearly absent in the existing literature, in order to assess the generalizability of existing solutions. For the experiments, we employ the ASVspoof2019, ASVspoof2021, and VSDC datasets along with GMM, SVM, CNN, and CNN-GRU classifiers. For reproducibility of the results, the code of the testbed can be found at our GitHub Repository (https://github.com/smileslab/Comparative-Analysis-Voice-Spoofing).
期刊介绍:
Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.