{"title":"基于能量分离的新型调频特征用于欺骗语音分类","authors":"Madhu R. Kamble, H. Patil","doi":"10.1109/ICAPR.2017.8593041","DOIUrl":null,"url":null,"abstract":"Speech Synthesis (SS) and Voice Conversion (VC) methods provides a great risk for Automatic Speaker Verification (ASV) system. In this paper, we tried to find the difference between natural and spoofed speech signals using Teager Energy Operator-based Energy Separation Algorithm (TEO-ESA). Here, we exploit the contribution of Amplitude Envelope (AE) and Instantaneous Frequency (IF) in each narrowband filtered signals energy via ESA to capture possible changes in a temporal and spectral envelope of the synthetic speech signal generated by the machines as opposed to natural signals. Furthermore, IF was used for classification of natural vs. spoof speech with Gaussian Mixture Model (GMM) as a classifier. These findings may assist to distinguish these two speeches and provide an aid to alleviate possible impostor attacks in voice biometrics. The experiments are done on ASV Spoof 2015 Challenge database. We have compared proposed Energy Separation Algorithm-Instantaneous Frequency Cosine Coefficients (ESA-IFCC) with Mel Frequency Cepstral Coefficients (MFCC) features. On the development set, MFCC alone gave an Equal Error Rate (EER) of (6.98 %) and ESA-IFCC gave (5.43 %) with 13-D static features. With score-level fusion of MFCC and ESA-IFCC EER reduced to 3.45 % on static feature vector. The EER decreases further to 2.01 % and 1.89 % for Δ and ΔΔ features. On evaluation set, the overall average error rate for known and unknown attacks was 6.79 % for ESA-IFCC and was significantly better than the MFCC (9.15 %) and their score-level fused EER (7.16 %).","PeriodicalId":239965,"journal":{"name":"2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Novel Energy Separation Based Frequency Modulation Features for Spoofed Speech Classification\",\"authors\":\"Madhu R. Kamble, H. Patil\",\"doi\":\"10.1109/ICAPR.2017.8593041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech Synthesis (SS) and Voice Conversion (VC) methods provides a great risk for Automatic Speaker Verification (ASV) system. In this paper, we tried to find the difference between natural and spoofed speech signals using Teager Energy Operator-based Energy Separation Algorithm (TEO-ESA). Here, we exploit the contribution of Amplitude Envelope (AE) and Instantaneous Frequency (IF) in each narrowband filtered signals energy via ESA to capture possible changes in a temporal and spectral envelope of the synthetic speech signal generated by the machines as opposed to natural signals. Furthermore, IF was used for classification of natural vs. spoof speech with Gaussian Mixture Model (GMM) as a classifier. These findings may assist to distinguish these two speeches and provide an aid to alleviate possible impostor attacks in voice biometrics. The experiments are done on ASV Spoof 2015 Challenge database. We have compared proposed Energy Separation Algorithm-Instantaneous Frequency Cosine Coefficients (ESA-IFCC) with Mel Frequency Cepstral Coefficients (MFCC) features. On the development set, MFCC alone gave an Equal Error Rate (EER) of (6.98 %) and ESA-IFCC gave (5.43 %) with 13-D static features. With score-level fusion of MFCC and ESA-IFCC EER reduced to 3.45 % on static feature vector. The EER decreases further to 2.01 % and 1.89 % for Δ and ΔΔ features. On evaluation set, the overall average error rate for known and unknown attacks was 6.79 % for ESA-IFCC and was significantly better than the MFCC (9.15 %) and their score-level fused EER (7.16 %).\",\"PeriodicalId\":239965,\"journal\":{\"name\":\"2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR)\",\"volume\":\"88 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPR.2017.8593041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPR.2017.8593041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Novel Energy Separation Based Frequency Modulation Features for Spoofed Speech Classification
Speech Synthesis (SS) and Voice Conversion (VC) methods provides a great risk for Automatic Speaker Verification (ASV) system. In this paper, we tried to find the difference between natural and spoofed speech signals using Teager Energy Operator-based Energy Separation Algorithm (TEO-ESA). Here, we exploit the contribution of Amplitude Envelope (AE) and Instantaneous Frequency (IF) in each narrowband filtered signals energy via ESA to capture possible changes in a temporal and spectral envelope of the synthetic speech signal generated by the machines as opposed to natural signals. Furthermore, IF was used for classification of natural vs. spoof speech with Gaussian Mixture Model (GMM) as a classifier. These findings may assist to distinguish these two speeches and provide an aid to alleviate possible impostor attacks in voice biometrics. The experiments are done on ASV Spoof 2015 Challenge database. We have compared proposed Energy Separation Algorithm-Instantaneous Frequency Cosine Coefficients (ESA-IFCC) with Mel Frequency Cepstral Coefficients (MFCC) features. On the development set, MFCC alone gave an Equal Error Rate (EER) of (6.98 %) and ESA-IFCC gave (5.43 %) with 13-D static features. With score-level fusion of MFCC and ESA-IFCC EER reduced to 3.45 % on static feature vector. The EER decreases further to 2.01 % and 1.89 % for Δ and ΔΔ features. On evaluation set, the overall average error rate for known and unknown attacks was 6.79 % for ESA-IFCC and was significantly better than the MFCC (9.15 %) and their score-level fused EER (7.16 %).