{"title":"会话语音识别中特征级上下文建模的一种新型瓶颈- blstm前端","authors":"M. Wöllmer, Björn Schuller, G. Rigoll","doi":"10.1109/ASRU.2011.6163902","DOIUrl":null,"url":null,"abstract":"We present a novel automatic speech recognition (ASR) front-end that unites Long Short-Term Memory context modeling, bidirectional speech processing, and bottleneck (BN) networks for enhanced Tandem speech feature generation. Bidirectional Long Short-Term Memory (BLSTM) networks were shown to be well suited for phoneme recognition and probabilistic feature extraction since they efficiently incorporate a flexible amount of long-range temporal context, leading to better ASR results than conventional recurrent networks or multi-layer perceptrons. Combining BLSTM modeling and bottleneck feature generation allows us to produce feature vectors of arbitrary size, independent of the network training targets. Experiments on the COSINE and the Buckeye corpora containing spontaneous, conversational speech show that the proposed BN-BLSTM front-end leads to better ASR accuracies than previously proposed BLSTM-based Tandem and multi-stream systems.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"A novel bottleneck-BLSTM front-end for feature-level context modeling in conversational speech recognition\",\"authors\":\"M. Wöllmer, Björn Schuller, G. Rigoll\",\"doi\":\"10.1109/ASRU.2011.6163902\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a novel automatic speech recognition (ASR) front-end that unites Long Short-Term Memory context modeling, bidirectional speech processing, and bottleneck (BN) networks for enhanced Tandem speech feature generation. Bidirectional Long Short-Term Memory (BLSTM) networks were shown to be well suited for phoneme recognition and probabilistic feature extraction since they efficiently incorporate a flexible amount of long-range temporal context, leading to better ASR results than conventional recurrent networks or multi-layer perceptrons. Combining BLSTM modeling and bottleneck feature generation allows us to produce feature vectors of arbitrary size, independent of the network training targets. Experiments on the COSINE and the Buckeye corpora containing spontaneous, conversational speech show that the proposed BN-BLSTM front-end leads to better ASR accuracies than previously proposed BLSTM-based Tandem and multi-stream systems.\",\"PeriodicalId\":338241,\"journal\":{\"name\":\"2011 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2011.6163902\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163902","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A novel bottleneck-BLSTM front-end for feature-level context modeling in conversational speech recognition
We present a novel automatic speech recognition (ASR) front-end that unites Long Short-Term Memory context modeling, bidirectional speech processing, and bottleneck (BN) networks for enhanced Tandem speech feature generation. Bidirectional Long Short-Term Memory (BLSTM) networks were shown to be well suited for phoneme recognition and probabilistic feature extraction since they efficiently incorporate a flexible amount of long-range temporal context, leading to better ASR results than conventional recurrent networks or multi-layer perceptrons. Combining BLSTM modeling and bottleneck feature generation allows us to produce feature vectors of arbitrary size, independent of the network training targets. Experiments on the COSINE and the Buckeye corpora containing spontaneous, conversational speech show that the proposed BN-BLSTM front-end leads to better ASR accuracies than previously proposed BLSTM-based Tandem and multi-stream systems.