{"title":"语音信号分析相位在自动说话人验证系统中检测重放攻击中的重要性","authors":"B. M. Rafi, K. Murty","doi":"10.1109/ICASSP.2019.8683500","DOIUrl":null,"url":null,"abstract":"In this paper, the importance of analytic phase of the speech signal in automatic speaker verification systems is demonstrated in the context of replay spoof attacks. In order to accurately detect the replay spoof attacks, effective feature representations of speech signals are required to capture the distortion introduced due to the intermediate playback/recording devices, which is convolutive in nature. Since the convolutional distortion in time-domain translates to additive distortion in the phase-domain, we propose to use IFCC features extracted from the analytic phase of the speech signal. The IFCC features contain information from both clean speech and distortion components. The clean speech component has to be subtracted in order to highlight the distortion component introduced by the playback/recording devices. In this work, a dictionary learned from the IFCCs extracted from clean speech data is used to remove the clean speech component. The residual distortion component is used as a feature to build binary classifier for replay spoof detection. The proposed phase-based features delivered a 9% absolute improvement over the baseline system built using magnitude-based CQCC features.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"6306-6310"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Importance of Analytic Phase of the Speech Signal for Detecting Replay Attacks in Automatic Speaker Verification Systems\",\"authors\":\"B. M. Rafi, K. Murty\",\"doi\":\"10.1109/ICASSP.2019.8683500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, the importance of analytic phase of the speech signal in automatic speaker verification systems is demonstrated in the context of replay spoof attacks. In order to accurately detect the replay spoof attacks, effective feature representations of speech signals are required to capture the distortion introduced due to the intermediate playback/recording devices, which is convolutive in nature. Since the convolutional distortion in time-domain translates to additive distortion in the phase-domain, we propose to use IFCC features extracted from the analytic phase of the speech signal. The IFCC features contain information from both clean speech and distortion components. The clean speech component has to be subtracted in order to highlight the distortion component introduced by the playback/recording devices. In this work, a dictionary learned from the IFCCs extracted from clean speech data is used to remove the clean speech component. The residual distortion component is used as a feature to build binary classifier for replay spoof detection. The proposed phase-based features delivered a 9% absolute improvement over the baseline system built using magnitude-based CQCC features.\",\"PeriodicalId\":13203,\"journal\":{\"name\":\"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"16 1\",\"pages\":\"6306-6310\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2019.8683500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2019.8683500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Importance of Analytic Phase of the Speech Signal for Detecting Replay Attacks in Automatic Speaker Verification Systems
In this paper, the importance of analytic phase of the speech signal in automatic speaker verification systems is demonstrated in the context of replay spoof attacks. In order to accurately detect the replay spoof attacks, effective feature representations of speech signals are required to capture the distortion introduced due to the intermediate playback/recording devices, which is convolutive in nature. Since the convolutional distortion in time-domain translates to additive distortion in the phase-domain, we propose to use IFCC features extracted from the analytic phase of the speech signal. The IFCC features contain information from both clean speech and distortion components. The clean speech component has to be subtracted in order to highlight the distortion component introduced by the playback/recording devices. In this work, a dictionary learned from the IFCCs extracted from clean speech data is used to remove the clean speech component. The residual distortion component is used as a feature to build binary classifier for replay spoof detection. The proposed phase-based features delivered a 9% absolute improvement over the baseline system built using magnitude-based CQCC features.