{"title":"使用耳语语音识别说话人","authors":"N. P. Jawarkar, R. S. Holambe, T. Basu","doi":"10.1109/CSNT.2013.167","DOIUrl":null,"url":null,"abstract":"The study of closed set text-independent speaker identification using whisper speech is presented in this paper. A new feature called temporal Teager energy based sub band cepstral coefficients (TTESBCC) is proposed. The work presented compares the performance of four feature sets: Mel frequency cepstral coefficients (MFCC), temporal energy of sub band cepstral coefficients (TESBCC), weighted instantaneous frequency (WIF) and TTESBCC. Next, outputs of three classifiers are combined and its performance is compared with that of the individual classifiers. The speaker identification system is trained using neutral speech and tested using neutral and whisper speech. The database of twenty five speakers containing speech utterances recorded in one of the Indian languages (Marathi) in the neutral and whisper environments is used for experimentation. Gaussian mixture model is used for classification. It is observed that performance of the speaker identification system degrades drastically when tested using whisper speech utterances. Fusion of classifiers enhances the speaker identification accuracy in both whisper and neutral environment.","PeriodicalId":111865,"journal":{"name":"2013 International Conference on Communication Systems and Network Technologies","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Speaker Identification Using Whispered Speech\",\"authors\":\"N. P. Jawarkar, R. S. Holambe, T. Basu\",\"doi\":\"10.1109/CSNT.2013.167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The study of closed set text-independent speaker identification using whisper speech is presented in this paper. A new feature called temporal Teager energy based sub band cepstral coefficients (TTESBCC) is proposed. The work presented compares the performance of four feature sets: Mel frequency cepstral coefficients (MFCC), temporal energy of sub band cepstral coefficients (TESBCC), weighted instantaneous frequency (WIF) and TTESBCC. Next, outputs of three classifiers are combined and its performance is compared with that of the individual classifiers. The speaker identification system is trained using neutral speech and tested using neutral and whisper speech. The database of twenty five speakers containing speech utterances recorded in one of the Indian languages (Marathi) in the neutral and whisper environments is used for experimentation. Gaussian mixture model is used for classification. It is observed that performance of the speaker identification system degrades drastically when tested using whisper speech utterances. Fusion of classifiers enhances the speaker identification accuracy in both whisper and neutral environment.\",\"PeriodicalId\":111865,\"journal\":{\"name\":\"2013 International Conference on Communication Systems and Network Technologies\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Communication Systems and Network Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSNT.2013.167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Communication Systems and Network Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSNT.2013.167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The study of closed set text-independent speaker identification using whisper speech is presented in this paper. A new feature called temporal Teager energy based sub band cepstral coefficients (TTESBCC) is proposed. The work presented compares the performance of four feature sets: Mel frequency cepstral coefficients (MFCC), temporal energy of sub band cepstral coefficients (TESBCC), weighted instantaneous frequency (WIF) and TTESBCC. Next, outputs of three classifiers are combined and its performance is compared with that of the individual classifiers. The speaker identification system is trained using neutral speech and tested using neutral and whisper speech. The database of twenty five speakers containing speech utterances recorded in one of the Indian languages (Marathi) in the neutral and whisper environments is used for experimentation. Gaussian mixture model is used for classification. It is observed that performance of the speaker identification system degrades drastically when tested using whisper speech utterances. Fusion of classifiers enhances the speaker identification accuracy in both whisper and neutral environment.