Replay Attack Detection Based on Voice and Non-voice Sections for Speaker Verification

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI:10.23919/APSIPAASC55919.2022.9980225

Ananda Garin Mills, Patthranit Kaewcharuay, Pannathorn Sathirasattayanon, Suradej Duangpummet, Kasorn Galajit, Jessada Karnjana, P. Aimmanee

{"title":"Replay Attack Detection Based on Voice and Non-voice Sections for Speaker Verification","authors":"Ananda Garin Mills, Patthranit Kaewcharuay, Pannathorn Sathirasattayanon, Suradej Duangpummet, Kasorn Galajit, Jessada Karnjana, P. Aimmanee","doi":"10.23919/APSIPAASC55919.2022.9980225","DOIUrl":null,"url":null,"abstract":"Voice can represent a person's identity. Thus, it can be used in automatic speaker verification (ASV) systems for authenticating secure applications. Unfortunately, existing ASV systems are vulnerable to spoofing attacks. A replay attack is a widely used spoofing technique because it is simple but difficult to detect. Hence, many methods are proposed for countermeasures against replay attacks. Most work inseparably considers voice and non-voice sections in the detection's performance. In this work, we investigate the spoof detection performances when the voice, non-voice, and both with different percentages of voice are used to obtain the optimal section. We also propose a method for detecting replay attacks using the optimal section of a signal. Mel-frequency cepstral coefficients are calculated from the optimal section as a feature, and the ResNet-34 model is used for classification. We evaluated the proposed method using a dataset from the ASVspoof 2019 challenge. The results depict that the optimal section for replay attack detection is when 10% and 20% of voice are included in the non-voice sections. It also showed that the proposed method outperforms the baselines with a 7.52% relatively improvement or an equal error rate of 1.72%.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9980225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Voice can represent a person's identity. Thus, it can be used in automatic speaker verification (ASV) systems for authenticating secure applications. Unfortunately, existing ASV systems are vulnerable to spoofing attacks. A replay attack is a widely used spoofing technique because it is simple but difficult to detect. Hence, many methods are proposed for countermeasures against replay attacks. Most work inseparably considers voice and non-voice sections in the detection's performance. In this work, we investigate the spoof detection performances when the voice, non-voice, and both with different percentages of voice are used to obtain the optimal section. We also propose a method for detecting replay attacks using the optimal section of a signal. Mel-frequency cepstral coefficients are calculated from the optimal section as a feature, and the ResNet-34 model is used for classification. We evaluated the proposed method using a dataset from the ASVspoof 2019 challenge. The results depict that the optimal section for replay attack detection is when 10% and 20% of voice are included in the non-voice sections. It also showed that the proposed method outperforms the baselines with a 7.52% relatively improvement or an equal error rate of 1.72%.

查看原文本刊更多论文

基于语音和非语音片段的说话人验证重放攻击检测

声音可以代表一个人的身份。因此，它可以用于自动说话人验证(ASV)系统，以验证安全应用程序。不幸的是，现有的ASV系统很容易受到欺骗攻击。重放攻击是一种被广泛使用的欺骗技术，因为它简单但难以检测。因此，提出了许多对抗重放攻击的方法。大多数工作在检测性能中不可分割地考虑了声音和非声音部分。在这项工作中，我们研究了语音、非语音和两者在不同语音百分比下的欺骗检测性能，以获得最佳部分。我们还提出了一种利用信号的最优部分检测重放攻击的方法。从最优截面计算mel频倒谱系数作为特征，使用ResNet-34模型进行分类。我们使用来自ASVspoof 2019挑战的数据集评估了所提出的方法。结果表明，重放攻击检测的最佳部分是在非语音部分中包含10%和20%的语音。该方法的相对改进率为7.52%，错误率为1.72%，优于基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量