{"title":"基于深度神经网络的音素识别中幅度和相位信息的Turbo融合","authors":"Timo Lohrenz, T. Fingscheidt","doi":"10.1109/ASRU.2017.8268925","DOIUrl":null,"url":null,"abstract":"In this work we propose the so-called turbo fusion as competitive method for information fusion of Mel-filterbank magnitude and phase feature streams for automatic speech recognition (ASR). Based on the recently introduced turbo ASR paradigm, our contribution is fourfold: First, we introduce DNN-based acoustic modeling into turbo ASR, then we take steps towards LVCSR by omitting the costly state space transform and by investigating the classical TIMIT phoneme recognition task. Finally, replacing the typical stream weighting in fusion methods, we introduce a new dynamic range limitation of the exchanged posteriors between the involved magnitude and phase recognizers, resulting in a smoother information exchange. The proposed turbo fusion outperforms classical benchmarks on the TIMIT dataset both with and without dropout in DNN training, and also is first if compared to several state-of-the-art reference fusion methods.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Turbo fusion of magnitude and phase information for DNN-based phoneme recognition\",\"authors\":\"Timo Lohrenz, T. Fingscheidt\",\"doi\":\"10.1109/ASRU.2017.8268925\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we propose the so-called turbo fusion as competitive method for information fusion of Mel-filterbank magnitude and phase feature streams for automatic speech recognition (ASR). Based on the recently introduced turbo ASR paradigm, our contribution is fourfold: First, we introduce DNN-based acoustic modeling into turbo ASR, then we take steps towards LVCSR by omitting the costly state space transform and by investigating the classical TIMIT phoneme recognition task. Finally, replacing the typical stream weighting in fusion methods, we introduce a new dynamic range limitation of the exchanged posteriors between the involved magnitude and phase recognizers, resulting in a smoother information exchange. The proposed turbo fusion outperforms classical benchmarks on the TIMIT dataset both with and without dropout in DNN training, and also is first if compared to several state-of-the-art reference fusion methods.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268925\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Turbo fusion of magnitude and phase information for DNN-based phoneme recognition
In this work we propose the so-called turbo fusion as competitive method for information fusion of Mel-filterbank magnitude and phase feature streams for automatic speech recognition (ASR). Based on the recently introduced turbo ASR paradigm, our contribution is fourfold: First, we introduce DNN-based acoustic modeling into turbo ASR, then we take steps towards LVCSR by omitting the costly state space transform and by investigating the classical TIMIT phoneme recognition task. Finally, replacing the typical stream weighting in fusion methods, we introduce a new dynamic range limitation of the exchanged posteriors between the involved magnitude and phase recognizers, resulting in a smoother information exchange. The proposed turbo fusion outperforms classical benchmarks on the TIMIT dataset both with and without dropout in DNN training, and also is first if compared to several state-of-the-art reference fusion methods.