基于能量分离的新型调频特征用于欺骗语音分类

2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR) Pub Date : 2017-12-01 DOI:10.1109/ICAPR.2017.8593041

Madhu R. Kamble, H. Patil

{"title":"基于能量分离的新型调频特征用于欺骗语音分类","authors":"Madhu R. Kamble, H. Patil","doi":"10.1109/ICAPR.2017.8593041","DOIUrl":null,"url":null,"abstract":"Speech Synthesis (SS) and Voice Conversion (VC) methods provides a great risk for Automatic Speaker Verification (ASV) system. In this paper, we tried to find the difference between natural and spoofed speech signals using Teager Energy Operator-based Energy Separation Algorithm (TEO-ESA). Here, we exploit the contribution of Amplitude Envelope (AE) and Instantaneous Frequency (IF) in each narrowband filtered signals energy via ESA to capture possible changes in a temporal and spectral envelope of the synthetic speech signal generated by the machines as opposed to natural signals. Furthermore, IF was used for classification of natural vs. spoof speech with Gaussian Mixture Model (GMM) as a classifier. These findings may assist to distinguish these two speeches and provide an aid to alleviate possible impostor attacks in voice biometrics. The experiments are done on ASV Spoof 2015 Challenge database. We have compared proposed Energy Separation Algorithm-Instantaneous Frequency Cosine Coefficients (ESA-IFCC) with Mel Frequency Cepstral Coefficients (MFCC) features. On the development set, MFCC alone gave an Equal Error Rate (EER) of (6.98 %) and ESA-IFCC gave (5.43 %) with 13-D static features. With score-level fusion of MFCC and ESA-IFCC EER reduced to 3.45 % on static feature vector. The EER decreases further to 2.01 % and 1.89 % for Δ and ΔΔ features. On evaluation set, the overall average error rate for known and unknown attacks was 6.79 % for ESA-IFCC and was significantly better than the MFCC (9.15 %) and their score-level fused EER (7.16 %).","PeriodicalId":239965,"journal":{"name":"2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Novel Energy Separation Based Frequency Modulation Features for Spoofed Speech Classification\",\"authors\":\"Madhu R. Kamble, H. Patil\",\"doi\":\"10.1109/ICAPR.2017.8593041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech Synthesis (SS) and Voice Conversion (VC) methods provides a great risk for Automatic Speaker Verification (ASV) system. In this paper, we tried to find the difference between natural and spoofed speech signals using Teager Energy Operator-based Energy Separation Algorithm (TEO-ESA). Here, we exploit the contribution of Amplitude Envelope (AE) and Instantaneous Frequency (IF) in each narrowband filtered signals energy via ESA to capture possible changes in a temporal and spectral envelope of the synthetic speech signal generated by the machines as opposed to natural signals. Furthermore, IF was used for classification of natural vs. spoof speech with Gaussian Mixture Model (GMM) as a classifier. These findings may assist to distinguish these two speeches and provide an aid to alleviate possible impostor attacks in voice biometrics. The experiments are done on ASV Spoof 2015 Challenge database. We have compared proposed Energy Separation Algorithm-Instantaneous Frequency Cosine Coefficients (ESA-IFCC) with Mel Frequency Cepstral Coefficients (MFCC) features. On the development set, MFCC alone gave an Equal Error Rate (EER) of (6.98 %) and ESA-IFCC gave (5.43 %) with 13-D static features. With score-level fusion of MFCC and ESA-IFCC EER reduced to 3.45 % on static feature vector. The EER decreases further to 2.01 % and 1.89 % for Δ and ΔΔ features. On evaluation set, the overall average error rate for known and unknown attacks was 6.79 % for ESA-IFCC and was significantly better than the MFCC (9.15 %) and their score-level fused EER (7.16 %).\",\"PeriodicalId\":239965,\"journal\":{\"name\":\"2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR)\",\"volume\":\"88 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPR.2017.8593041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPR.2017.8593041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

语音合成(SS)和语音转换(VC)方法为自动说话人验证(ASV)系统提供了很大的风险。在本文中，我们尝试使用基于Teager能量算子的能量分离算法(TEO-ESA)来找出自然语音信号与欺骗语音信号之间的区别。在这里，我们利用振幅包络(AE)和瞬时频率(IF)在每个窄带滤波信号能量中的贡献，通过ESA捕获机器生成的合成语音信号的时间和频谱包络的可能变化，而不是自然信号。此外，使用高斯混合模型(GMM)作为分类器，将IF用于自然语音与欺骗语音的分类。这些发现可能有助于区分这两种演讲，并为减轻语音生物识别中可能的骗子攻击提供帮助。实验在ASV Spoof 2015 Challenge数据库上进行。我们比较了提出的能量分离算法-瞬时频率余弦系数(ESA-IFCC)与Mel频率倒谱系数(MFCC)特征。在开发集上，MFCC单独给出的等效错误率(EER)为6.98%，ESA-IFCC给出的等效错误率(EER)为5.43%。在静态特征向量上，MFCC与ESA-IFCC融合后的识别率降低到3.45%。对于Δ和ΔΔ特征，EER进一步降低至2.01%和1.89%。在评估集上，ESA-IFCC对已知和未知攻击的总体平均错误率为6.79%，明显优于MFCC(9.15%)和它们的评分水平融合EER(7.16%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Novel Energy Separation Based Frequency Modulation Features for Spoofed Speech Classification

Speech Synthesis (SS) and Voice Conversion (VC) methods provides a great risk for Automatic Speaker Verification (ASV) system. In this paper, we tried to find the difference between natural and spoofed speech signals using Teager Energy Operator-based Energy Separation Algorithm (TEO-ESA). Here, we exploit the contribution of Amplitude Envelope (AE) and Instantaneous Frequency (IF) in each narrowband filtered signals energy via ESA to capture possible changes in a temporal and spectral envelope of the synthetic speech signal generated by the machines as opposed to natural signals. Furthermore, IF was used for classification of natural vs. spoof speech with Gaussian Mixture Model (GMM) as a classifier. These findings may assist to distinguish these two speeches and provide an aid to alleviate possible impostor attacks in voice biometrics. The experiments are done on ASV Spoof 2015 Challenge database. We have compared proposed Energy Separation Algorithm-Instantaneous Frequency Cosine Coefficients (ESA-IFCC) with Mel Frequency Cepstral Coefficients (MFCC) features. On the development set, MFCC alone gave an Equal Error Rate (EER) of (6.98 %) and ESA-IFCC gave (5.43 %) with 13-D static features. With score-level fusion of MFCC and ESA-IFCC EER reduced to 3.45 % on static feature vector. The EER decreases further to 2.01 % and 1.89 % for Δ and ΔΔ features. On evaluation set, the overall average error rate for known and unknown attacks was 6.79 % for ESA-IFCC and was significantly better than the MFCC (9.15 %) and their score-level fused EER (7.16 %).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR)

自引率

0.00%

发文量