An Experimental Study on Audio Replay Attack Detection Using Deep Neural Networks

Bekir Bakar, C. Hanilçi
{"title":"An Experimental Study on Audio Replay Attack Detection Using Deep Neural Networks","authors":"Bekir Bakar, C. Hanilçi","doi":"10.1109/SLT.2018.8639511","DOIUrl":null,"url":null,"abstract":"Automatic speaker verification (ASV) systems can be easily spoofed by previously recorded speech, synthesized speech and speech signal that artificially generated by voice conversion techniques. In order to increase the reliability of the ASV systems, detecting spoofing attacks whether a given speech signal is genuine or spoofed plays an important role. In this paper, we consider the detection of replay attacks which is the most accessible attack type against ASV systems. To this end, we utilize a deep neural network (DNN) based classifier using features extracted from the long-term average spectrum. The experiments are conducted on the latest edition of Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2017) database. The results are compared with the ASVspoof 2017 baseline system which consists of Gaussian mixture model (GMM) classifier with constant-Q transform cepstral coefficients (CQCC) front-end as well as the GMM with standard mel-frequency cepstrum coefficients (MFCC) features. Experimental results reveal that DNN considerably outperforms the well-known and successful GMM classifier. It is found that long term average spectrum (LTAS) based features are superior to CQCC and MFCC in terms of equal error rate (EER). Finally, we find that high-frequency components convey much more discriminative information for replay attack detection independent of features and classifiers.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Automatic speaker verification (ASV) systems can be easily spoofed by previously recorded speech, synthesized speech and speech signal that artificially generated by voice conversion techniques. In order to increase the reliability of the ASV systems, detecting spoofing attacks whether a given speech signal is genuine or spoofed plays an important role. In this paper, we consider the detection of replay attacks which is the most accessible attack type against ASV systems. To this end, we utilize a deep neural network (DNN) based classifier using features extracted from the long-term average spectrum. The experiments are conducted on the latest edition of Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2017) database. The results are compared with the ASVspoof 2017 baseline system which consists of Gaussian mixture model (GMM) classifier with constant-Q transform cepstral coefficients (CQCC) front-end as well as the GMM with standard mel-frequency cepstrum coefficients (MFCC) features. Experimental results reveal that DNN considerably outperforms the well-known and successful GMM classifier. It is found that long term average spectrum (LTAS) based features are superior to CQCC and MFCC in terms of equal error rate (EER). Finally, we find that high-frequency components convey much more discriminative information for replay attack detection independent of features and classifiers.
基于深度神经网络的音频重放攻击检测实验研究
自动说话人验证(ASV)系统很容易被预先录制的语音、合成语音和语音转换技术人为产生的语音信号所欺骗。为了提高ASV系统的可靠性,检测欺骗攻击对给定语音信号的真伪起着重要的作用。在本文中,我们考虑了重放攻击的检测,这是针对ASV系统最容易获得的攻击类型。为此,我们利用基于深度神经网络(DNN)的分类器,使用从长期平均频谱中提取的特征。实验是在最新版的自动说话人验证欺骗和对抗挑战(ASVspoof 2017)数据库上进行的。将结果与具有恒定q变换倒谱系数(CQCC)前端的高斯混合模型(GMM)分类器和具有标准mel频率倒谱系数(MFCC)特征的GMM组成的ASVspoof 2017基线系统进行了比较。实验结果表明,深度神经网络的性能明显优于GMM分类器。在等误差率(EER)方面,基于长期平均谱(LTAS)的特征优于CQCC和MFCC。最后,我们发现高频成分为重放攻击检测提供了更多独立于特征和分类器的判别信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信