{"title":"自动语音识别系统中丢帧和衰减算法的应用","authors":"D. Vlaj, B. Kotnik, Z. Kaciv, B. Horvat","doi":"10.1109/EURCON.2003.1248170","DOIUrl":null,"url":null,"abstract":"In this paper the usage of frame dropping and frame attenuation algorithms in automatic speech recognition systems is presented. On the one hand, the use of frame dropping algorithms is important because the speech recognition system does not need to deal with noise-only parts of input signal, but on the other hand, the speech recognition results can be better if the spectral magnitudes of noise-only frames are attenuated. A novel approach of voice activity detection (VAD) based on the log filter-bank magnitudes needed for the frame dropping or the frame attenuation with the so-called \"hangover\" criterion is proposed. All tests were made on Slovenian, German, and Spanish fixed telephone SpeechDat II databases with the HTK speech recognition toolkit. The results obtained show the small word error rate can be achieved at small number of Gaussian mixtures if either frame dropping or frame attenuation algorithm is used.","PeriodicalId":337983,"journal":{"name":"The IEEE Region 8 EUROCON 2003. Computer as a Tool.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Usage of frame dropping and frame attenuation algorithms in automatic speech recognition systems\",\"authors\":\"D. Vlaj, B. Kotnik, Z. Kaciv, B. Horvat\",\"doi\":\"10.1109/EURCON.2003.1248170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper the usage of frame dropping and frame attenuation algorithms in automatic speech recognition systems is presented. On the one hand, the use of frame dropping algorithms is important because the speech recognition system does not need to deal with noise-only parts of input signal, but on the other hand, the speech recognition results can be better if the spectral magnitudes of noise-only frames are attenuated. A novel approach of voice activity detection (VAD) based on the log filter-bank magnitudes needed for the frame dropping or the frame attenuation with the so-called \\\"hangover\\\" criterion is proposed. All tests were made on Slovenian, German, and Spanish fixed telephone SpeechDat II databases with the HTK speech recognition toolkit. The results obtained show the small word error rate can be achieved at small number of Gaussian mixtures if either frame dropping or frame attenuation algorithm is used.\",\"PeriodicalId\":337983,\"journal\":{\"name\":\"The IEEE Region 8 EUROCON 2003. Computer as a Tool.\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The IEEE Region 8 EUROCON 2003. Computer as a Tool.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EURCON.2003.1248170\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The IEEE Region 8 EUROCON 2003. Computer as a Tool.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EURCON.2003.1248170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Usage of frame dropping and frame attenuation algorithms in automatic speech recognition systems
In this paper the usage of frame dropping and frame attenuation algorithms in automatic speech recognition systems is presented. On the one hand, the use of frame dropping algorithms is important because the speech recognition system does not need to deal with noise-only parts of input signal, but on the other hand, the speech recognition results can be better if the spectral magnitudes of noise-only frames are attenuated. A novel approach of voice activity detection (VAD) based on the log filter-bank magnitudes needed for the frame dropping or the frame attenuation with the so-called "hangover" criterion is proposed. All tests were made on Slovenian, German, and Spanish fixed telephone SpeechDat II databases with the HTK speech recognition toolkit. The results obtained show the small word error rate can be achieved at small number of Gaussian mixtures if either frame dropping or frame attenuation algorithm is used.