{"title":"线性增强深度神经网络","authors":"Pegah Ghahremani, J. Droppo, M. Seltzer","doi":"10.1109/ICASSP.2016.7472646","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNN) are a powerful tool for many large vocabulary continuous speech recognition (LVCSR) tasks. Training a very deep network is a challenging problem and pre-training techniques are needed in order to achieve the best results. In this paper, we propose a new type of network architecture, Linear Augmented Deep Neural Network (LA-DNN). This type of network augments each non-linear layer with a linear connection from layer input to layer output. The resulting LA-DNN model eliminates the need for pre-training, addresses the gradient vanishing problem for deep networks, has higher capacity in modeling linear transformations, trains significantly faster than normal DNN, and produces better acoustic models. The proposed model has been evaluated on TIMIT phoneme recognition and AMI speech recognition tasks. Experimental results show that the LA-DNN models can have 70% fewer parameters than a DNN, while still improving accuracy. On the TIMIT phoneme recognition task, the smaller LA-DNN model improves TIMIT phone accuracy by 2% absolute, and AMI word accuracy by 1.7% absolute.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Linearly augmented deep neural network\",\"authors\":\"Pegah Ghahremani, J. Droppo, M. Seltzer\",\"doi\":\"10.1109/ICASSP.2016.7472646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNN) are a powerful tool for many large vocabulary continuous speech recognition (LVCSR) tasks. Training a very deep network is a challenging problem and pre-training techniques are needed in order to achieve the best results. In this paper, we propose a new type of network architecture, Linear Augmented Deep Neural Network (LA-DNN). This type of network augments each non-linear layer with a linear connection from layer input to layer output. The resulting LA-DNN model eliminates the need for pre-training, addresses the gradient vanishing problem for deep networks, has higher capacity in modeling linear transformations, trains significantly faster than normal DNN, and produces better acoustic models. The proposed model has been evaluated on TIMIT phoneme recognition and AMI speech recognition tasks. Experimental results show that the LA-DNN models can have 70% fewer parameters than a DNN, while still improving accuracy. On the TIMIT phoneme recognition task, the smaller LA-DNN model improves TIMIT phone accuracy by 2% absolute, and AMI word accuracy by 1.7% absolute.\",\"PeriodicalId\":165321,\"journal\":{\"name\":\"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2016.7472646\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2016.7472646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep neural networks (DNN) are a powerful tool for many large vocabulary continuous speech recognition (LVCSR) tasks. Training a very deep network is a challenging problem and pre-training techniques are needed in order to achieve the best results. In this paper, we propose a new type of network architecture, Linear Augmented Deep Neural Network (LA-DNN). This type of network augments each non-linear layer with a linear connection from layer input to layer output. The resulting LA-DNN model eliminates the need for pre-training, addresses the gradient vanishing problem for deep networks, has higher capacity in modeling linear transformations, trains significantly faster than normal DNN, and produces better acoustic models. The proposed model has been evaluated on TIMIT phoneme recognition and AMI speech recognition tasks. Experimental results show that the LA-DNN models can have 70% fewer parameters than a DNN, while still improving accuracy. On the TIMIT phoneme recognition task, the smaller LA-DNN model improves TIMIT phone accuracy by 2% absolute, and AMI word accuracy by 1.7% absolute.