{"title":"利用源代码和系统特征提高手机识别精度","authors":"K. Manjunath, K. S. Rao, M. G. Reddy","doi":"10.1109/SPACES.2015.7058205","DOIUrl":null,"url":null,"abstract":"The goal of this work is to improve phone recognition accuracy using combination of source and system features. As speech is produced by exciting time varying vocal tract system with time varying excitation, we want to explore both source and system components of speech production system for phone recognition. The excitation source information is derived by processing linear prediction residual of speech signal. Mel-frequency cepstral coefficient features are used for capturing vocal tract information. The Phone Recognition Systems (PRSs) are developed using hidden Markov models. The proposed PRSs are developed for English and an Indian language Bengali using TEVIIT and Phonetic, Prosodically Rich Transcribed speech corpora, respectively. We have also developed tandem PRSs using the phone posteriors obtained from feedforward neural networks. The tandem PRSs developed using combination of excitation source and system features, outperform the conventional tandem systems developed using system features alone.","PeriodicalId":432479,"journal":{"name":"2015 International Conference on Signal Processing and Communication Engineering Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Improvement of phone recognition accuracy using source and system features\",\"authors\":\"K. Manjunath, K. S. Rao, M. G. Reddy\",\"doi\":\"10.1109/SPACES.2015.7058205\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of this work is to improve phone recognition accuracy using combination of source and system features. As speech is produced by exciting time varying vocal tract system with time varying excitation, we want to explore both source and system components of speech production system for phone recognition. The excitation source information is derived by processing linear prediction residual of speech signal. Mel-frequency cepstral coefficient features are used for capturing vocal tract information. The Phone Recognition Systems (PRSs) are developed using hidden Markov models. The proposed PRSs are developed for English and an Indian language Bengali using TEVIIT and Phonetic, Prosodically Rich Transcribed speech corpora, respectively. We have also developed tandem PRSs using the phone posteriors obtained from feedforward neural networks. The tandem PRSs developed using combination of excitation source and system features, outperform the conventional tandem systems developed using system features alone.\",\"PeriodicalId\":432479,\"journal\":{\"name\":\"2015 International Conference on Signal Processing and Communication Engineering Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Signal Processing and Communication Engineering Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPACES.2015.7058205\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Signal Processing and Communication Engineering Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPACES.2015.7058205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improvement of phone recognition accuracy using source and system features
The goal of this work is to improve phone recognition accuracy using combination of source and system features. As speech is produced by exciting time varying vocal tract system with time varying excitation, we want to explore both source and system components of speech production system for phone recognition. The excitation source information is derived by processing linear prediction residual of speech signal. Mel-frequency cepstral coefficient features are used for capturing vocal tract information. The Phone Recognition Systems (PRSs) are developed using hidden Markov models. The proposed PRSs are developed for English and an Indian language Bengali using TEVIIT and Phonetic, Prosodically Rich Transcribed speech corpora, respectively. We have also developed tandem PRSs using the phone posteriors obtained from feedforward neural networks. The tandem PRSs developed using combination of excitation source and system features, outperform the conventional tandem systems developed using system features alone.