Zhi Hao Lim, Xiaohai Tian, Wei Rao, Chng Eng Siong
{"title":"An investigation of spectral feature partitioning for replay attacks detection","authors":"Zhi Hao Lim, Xiaohai Tian, Wei Rao, Chng Eng Siong","doi":"10.1109/APSIPA.2017.8282273","DOIUrl":null,"url":null,"abstract":"Replay attacks from unseen utterances poses a significant challenge in Anti-Spoofing Detection. In this paper, we propose a statistical measure based on the Rayleigh Quotient in order to investigate a feature partition capable of discerning genuine and playback speech under unseen conditions. The Log- Magnitude Spectrum (LMS) of the utterances is used in this study. Using the proposed measure, we analyze the frequency bands of the LMS based on the amount of discriminative information between the scatter matrices of the genuine and spoof utterances. This allows us to determine the optimal frequency bands required for replay attacks detection. In addition, we further investigate the effects of training our models using voiced and unvoiced portions of the utterances. We conducted our experiments based on the ASVspoof 2017 database. On the development set, our partitioned LMS feature based on the whole utterance yields a 3.8% EER. After utilizing just the unvoiced portions of the utterances, the EER is further decreased to 3.27% while our baseline using the Constant Q Cepstral Coefficients (CQCC) as a feature is at 10.21%. The evaluation results also confirms the effectiveness of our approach.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8282273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Replay attacks from unseen utterances poses a significant challenge in Anti-Spoofing Detection. In this paper, we propose a statistical measure based on the Rayleigh Quotient in order to investigate a feature partition capable of discerning genuine and playback speech under unseen conditions. The Log- Magnitude Spectrum (LMS) of the utterances is used in this study. Using the proposed measure, we analyze the frequency bands of the LMS based on the amount of discriminative information between the scatter matrices of the genuine and spoof utterances. This allows us to determine the optimal frequency bands required for replay attacks detection. In addition, we further investigate the effects of training our models using voiced and unvoiced portions of the utterances. We conducted our experiments based on the ASVspoof 2017 database. On the development set, our partitioned LMS feature based on the whole utterance yields a 3.8% EER. After utilizing just the unvoiced portions of the utterances, the EER is further decreased to 3.27% while our baseline using the Constant Q Cepstral Coefficients (CQCC) as a feature is at 10.21%. The evaluation results also confirms the effectiveness of our approach.