Hao-Teng Fan, Kuan-wei Hsieh, Chien-hao Huang, J. Hung
{"title":"通过减轻噪声语音识别中的离群值效应来增强倒谱特征","authors":"Hao-Teng Fan, Kuan-wei Hsieh, Chien-hao Huang, J. Hung","doi":"10.1109/FSKD.2013.6816329","DOIUrl":null,"url":null,"abstract":"The performance of automatic speech recognition (ASR) systems is often seriously degraded by noise interference. Among the techniques to reduce the noise effect, cepstral mean-and-variance normalization (CMVN) is a simple yet quite effective approach for processing MFCC speech features. However, the features processed by CMVN contain a significant number of outliers, which very likely weakens the effect of CMVN. This paper primarily proposes to deal with the outliers left by CMVN with two directions. The first one is to apply a sigmoid function transformation, which provides explicit lower and upper bounds for the outliers, and the second one exploits the well-known median filter to remove the impulse-like outliers in the CMVN features. Under the Aurora-2 digit recognition database and task, the presented two frameworks give rise to around 5% in absolute accuracy improvement in comparison with CMVN, and the corresponding word error rate reduction relative to the MFCC baseline is as high as 50%.","PeriodicalId":368964,"journal":{"name":"2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robustifying cepstral features by mitigating the outlier effect for noisy speech recognition\",\"authors\":\"Hao-Teng Fan, Kuan-wei Hsieh, Chien-hao Huang, J. Hung\",\"doi\":\"10.1109/FSKD.2013.6816329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of automatic speech recognition (ASR) systems is often seriously degraded by noise interference. Among the techniques to reduce the noise effect, cepstral mean-and-variance normalization (CMVN) is a simple yet quite effective approach for processing MFCC speech features. However, the features processed by CMVN contain a significant number of outliers, which very likely weakens the effect of CMVN. This paper primarily proposes to deal with the outliers left by CMVN with two directions. The first one is to apply a sigmoid function transformation, which provides explicit lower and upper bounds for the outliers, and the second one exploits the well-known median filter to remove the impulse-like outliers in the CMVN features. Under the Aurora-2 digit recognition database and task, the presented two frameworks give rise to around 5% in absolute accuracy improvement in comparison with CMVN, and the corresponding word error rate reduction relative to the MFCC baseline is as high as 50%.\",\"PeriodicalId\":368964,\"journal\":{\"name\":\"2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FSKD.2013.6816329\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2013.6816329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robustifying cepstral features by mitigating the outlier effect for noisy speech recognition
The performance of automatic speech recognition (ASR) systems is often seriously degraded by noise interference. Among the techniques to reduce the noise effect, cepstral mean-and-variance normalization (CMVN) is a simple yet quite effective approach for processing MFCC speech features. However, the features processed by CMVN contain a significant number of outliers, which very likely weakens the effect of CMVN. This paper primarily proposes to deal with the outliers left by CMVN with two directions. The first one is to apply a sigmoid function transformation, which provides explicit lower and upper bounds for the outliers, and the second one exploits the well-known median filter to remove the impulse-like outliers in the CMVN features. Under the Aurora-2 digit recognition database and task, the presented two frameworks give rise to around 5% in absolute accuracy improvement in comparison with CMVN, and the corresponding word error rate reduction relative to the MFCC baseline is as high as 50%.