语音信号分析相位在自动说话人验证系统中检测重放攻击中的重要性

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-16 DOI:10.1109/ICASSP.2019.8683500

B. M. Rafi, K. Murty

{"title":"语音信号分析相位在自动说话人验证系统中检测重放攻击中的重要性","authors":"B. M. Rafi, K. Murty","doi":"10.1109/ICASSP.2019.8683500","DOIUrl":null,"url":null,"abstract":"In this paper, the importance of analytic phase of the speech signal in automatic speaker verification systems is demonstrated in the context of replay spoof attacks. In order to accurately detect the replay spoof attacks, effective feature representations of speech signals are required to capture the distortion introduced due to the intermediate playback/recording devices, which is convolutive in nature. Since the convolutional distortion in time-domain translates to additive distortion in the phase-domain, we propose to use IFCC features extracted from the analytic phase of the speech signal. The IFCC features contain information from both clean speech and distortion components. The clean speech component has to be subtracted in order to highlight the distortion component introduced by the playback/recording devices. In this work, a dictionary learned from the IFCCs extracted from clean speech data is used to remove the clean speech component. The residual distortion component is used as a feature to build binary classifier for replay spoof detection. The proposed phase-based features delivered a 9% absolute improvement over the baseline system built using magnitude-based CQCC features.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"6306-6310"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Importance of Analytic Phase of the Speech Signal for Detecting Replay Attacks in Automatic Speaker Verification Systems\",\"authors\":\"B. M. Rafi, K. Murty\",\"doi\":\"10.1109/ICASSP.2019.8683500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, the importance of analytic phase of the speech signal in automatic speaker verification systems is demonstrated in the context of replay spoof attacks. In order to accurately detect the replay spoof attacks, effective feature representations of speech signals are required to capture the distortion introduced due to the intermediate playback/recording devices, which is convolutive in nature. Since the convolutional distortion in time-domain translates to additive distortion in the phase-domain, we propose to use IFCC features extracted from the analytic phase of the speech signal. The IFCC features contain information from both clean speech and distortion components. The clean speech component has to be subtracted in order to highlight the distortion component introduced by the playback/recording devices. In this work, a dictionary learned from the IFCCs extracted from clean speech data is used to remove the clean speech component. The residual distortion component is used as a feature to build binary classifier for replay spoof detection. The proposed phase-based features delivered a 9% absolute improvement over the baseline system built using magnitude-based CQCC features.\",\"PeriodicalId\":13203,\"journal\":{\"name\":\"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"16 1\",\"pages\":\"6306-6310\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2019.8683500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2019.8683500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文在重播欺骗攻击的背景下，论证了语音信号分析相位在自动说话人验证系统中的重要性。为了准确检测重放欺骗攻击，需要对语音信号进行有效的特征表示，以捕获由于中间播放/记录设备而引入的失真，这在本质上是卷积的。由于时域的卷积失真转化为相域的加性失真，我们建议使用从语音信号的分析相位提取的IFCC特征。IFCC功能包含来自干净语音和失真分量的信息。为了突出播放/录制设备引入的失真分量，必须减去干净的语音分量。在这项工作中，使用从干净语音数据中提取的ifcc学习的字典来去除干净语音成分。利用残差失真分量作为特征来构建二元分类器，用于重放欺骗检测。与使用基于震级的CQCC特性构建的基线系统相比，提出的基于阶段的特性提供了9%的绝对改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Importance of Analytic Phase of the Speech Signal for Detecting Replay Attacks in Automatic Speaker Verification Systems

In this paper, the importance of analytic phase of the speech signal in automatic speaker verification systems is demonstrated in the context of replay spoof attacks. In order to accurately detect the replay spoof attacks, effective feature representations of speech signals are required to capture the distortion introduced due to the intermediate playback/recording devices, which is convolutive in nature. Since the convolutional distortion in time-domain translates to additive distortion in the phase-domain, we propose to use IFCC features extracted from the analytic phase of the speech signal. The IFCC features contain information from both clean speech and distortion components. The clean speech component has to be subtracted in order to highlight the distortion component introduced by the playback/recording devices. In this work, a dictionary learned from the IFCCs extracted from clean speech data is used to remove the clean speech component. The residual distortion component is used as a feature to build binary classifier for replay spoof detection. The proposed phase-based features delivered a 9% absolute improvement over the baseline system built using magnitude-based CQCC features.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量