{"title":"利用滤波器组能量作为特征进行鲁棒语音识别","authors":"K. Paliwal","doi":"10.1109/ISSPA.1999.815754","DOIUrl":null,"url":null,"abstract":"Though mel frequency cepstral coefficients (MFCCs) have been very successful in speech recognition, they have the following two problems: (1) they do not have any physical interpretation, and (2) liftering of cepstral coefficients, found to be highly useful in the earlier dynamic warping-based speech recognition systems, has no effect in the recognition process when used with continuous observation Gaussian density hidden Markov models. We propose to use the filter-bank energies (FBEs) as features. The FBEs are physically meaningful quantities and amenable for applying human auditory processing such as masking. We describe procedures to decorrelate and lifter the FBEs and show that the FBEs perform at least as good as (and sometimes even better than) the MFCCs for robust speech recognition.","PeriodicalId":302569,"journal":{"name":"ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"On the use of filter-bank energies as features for robust speech recognition\",\"authors\":\"K. Paliwal\",\"doi\":\"10.1109/ISSPA.1999.815754\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Though mel frequency cepstral coefficients (MFCCs) have been very successful in speech recognition, they have the following two problems: (1) they do not have any physical interpretation, and (2) liftering of cepstral coefficients, found to be highly useful in the earlier dynamic warping-based speech recognition systems, has no effect in the recognition process when used with continuous observation Gaussian density hidden Markov models. We propose to use the filter-bank energies (FBEs) as features. The FBEs are physically meaningful quantities and amenable for applying human auditory processing such as masking. We describe procedures to decorrelate and lifter the FBEs and show that the FBEs perform at least as good as (and sometimes even better than) the MFCCs for robust speech recognition.\",\"PeriodicalId\":302569,\"journal\":{\"name\":\"ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSPA.1999.815754\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPA.1999.815754","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the use of filter-bank energies as features for robust speech recognition
Though mel frequency cepstral coefficients (MFCCs) have been very successful in speech recognition, they have the following two problems: (1) they do not have any physical interpretation, and (2) liftering of cepstral coefficients, found to be highly useful in the earlier dynamic warping-based speech recognition systems, has no effect in the recognition process when used with continuous observation Gaussian density hidden Markov models. We propose to use the filter-bank energies (FBEs) as features. The FBEs are physically meaningful quantities and amenable for applying human auditory processing such as masking. We describe procedures to decorrelate and lifter the FBEs and show that the FBEs perform at least as good as (and sometimes even better than) the MFCCs for robust speech recognition.