Sheng Li, Xugang Lu, Peng Shen, R. Takashima, Tatsuya Kawahara, H. Kawai
{"title":"增量训练和构造极深卷积残差网络声学模型","authors":"Sheng Li, Xugang Lu, Peng Shen, R. Takashima, Tatsuya Kawahara, H. Kawai","doi":"10.1109/ASRU.2017.8268939","DOIUrl":null,"url":null,"abstract":"Inspired by the successful applications in image recognition, the very deep convolutional residual network (ResNet) based model has been applied in automatic speech recognition (ASR). However, the computational load is heavy for training the ResNet with a large quantity of data. In this paper, we propose an incremental model training framework to accelerate the training process of the ResNet. The incremental model training framework is based on the unequal importance of each layer and connection in the ResNet. The modules with important layers and connections are regarded as a skeleton model, while those left are regarded as an auxiliary model. The total depth of the skeleton model is quite shallow compared to the very deep full network. In our incremental training, the skeleton model is first trained with the full training data set. Other layers and connections belonging to the auxiliary model are gradually attached to the skeleton model and tuned. Our experiments showed that the proposed incremental training obtained comparable performances and faster training speed compared with the model training as a whole without consideration of the different importance of each layer.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incremental training and constructing the very deep convolutional residual network acoustic models\",\"authors\":\"Sheng Li, Xugang Lu, Peng Shen, R. Takashima, Tatsuya Kawahara, H. Kawai\",\"doi\":\"10.1109/ASRU.2017.8268939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inspired by the successful applications in image recognition, the very deep convolutional residual network (ResNet) based model has been applied in automatic speech recognition (ASR). However, the computational load is heavy for training the ResNet with a large quantity of data. In this paper, we propose an incremental model training framework to accelerate the training process of the ResNet. The incremental model training framework is based on the unequal importance of each layer and connection in the ResNet. The modules with important layers and connections are regarded as a skeleton model, while those left are regarded as an auxiliary model. The total depth of the skeleton model is quite shallow compared to the very deep full network. In our incremental training, the skeleton model is first trained with the full training data set. Other layers and connections belonging to the auxiliary model are gradually attached to the skeleton model and tuned. Our experiments showed that the proposed incremental training obtained comparable performances and faster training speed compared with the model training as a whole without consideration of the different importance of each layer.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Incremental training and constructing the very deep convolutional residual network acoustic models
Inspired by the successful applications in image recognition, the very deep convolutional residual network (ResNet) based model has been applied in automatic speech recognition (ASR). However, the computational load is heavy for training the ResNet with a large quantity of data. In this paper, we propose an incremental model training framework to accelerate the training process of the ResNet. The incremental model training framework is based on the unequal importance of each layer and connection in the ResNet. The modules with important layers and connections are regarded as a skeleton model, while those left are regarded as an auxiliary model. The total depth of the skeleton model is quite shallow compared to the very deep full network. In our incremental training, the skeleton model is first trained with the full training data set. Other layers and connections belonging to the auxiliary model are gradually attached to the skeleton model and tuned. Our experiments showed that the proposed incremental training obtained comparable performances and faster training speed compared with the model training as a whole without consideration of the different importance of each layer.