{"title":"A scheme discriminating between synthetic speech and normal speech","authors":"Jilun Chen, Weiqiang Zhang, Jia Liu","doi":"10.1109/ICALIP.2016.7846613","DOIUrl":null,"url":null,"abstract":"This paper develops a system to automatically distinguish natural speech from synthetic speech. The issue of feature selection is considered. We take commonly used feature Mel-Frequency Cepstrum Coefficient (MFCC) in consideration, as well as other features such as Relative Phase Shift (RPS) and pitch tuned for Automatically Speech Recognition (ASR). We found some features are complimentary in the task of discriminating synthetic and natural speech. Gaussian Mixture Model Support Vector Machine (GMM-SVM) system is applied as classifier with feature input modified and compared to that of feature is applied in speaker recognition. Experiment on Librespeech versus online Text-to-Speech (TTS) speech synthesis platforms data set verified the effectiveness of the combination of these features.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALIP.2016.7846613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper develops a system to automatically distinguish natural speech from synthetic speech. The issue of feature selection is considered. We take commonly used feature Mel-Frequency Cepstrum Coefficient (MFCC) in consideration, as well as other features such as Relative Phase Shift (RPS) and pitch tuned for Automatically Speech Recognition (ASR). We found some features are complimentary in the task of discriminating synthetic and natural speech. Gaussian Mixture Model Support Vector Machine (GMM-SVM) system is applied as classifier with feature input modified and compared to that of feature is applied in speaker recognition. Experiment on Librespeech versus online Text-to-Speech (TTS) speech synthesis platforms data set verified the effectiveness of the combination of these features.