增量训练和构造极深卷积残差网络声学模型

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-12-01 DOI:10.1109/ASRU.2017.8268939

Sheng Li, Xugang Lu, Peng Shen, R. Takashima, Tatsuya Kawahara, H. Kawai

{"title":"增量训练和构造极深卷积残差网络声学模型","authors":"Sheng Li, Xugang Lu, Peng Shen, R. Takashima, Tatsuya Kawahara, H. Kawai","doi":"10.1109/ASRU.2017.8268939","DOIUrl":null,"url":null,"abstract":"Inspired by the successful applications in image recognition, the very deep convolutional residual network (ResNet) based model has been applied in automatic speech recognition (ASR). However, the computational load is heavy for training the ResNet with a large quantity of data. In this paper, we propose an incremental model training framework to accelerate the training process of the ResNet. The incremental model training framework is based on the unequal importance of each layer and connection in the ResNet. The modules with important layers and connections are regarded as a skeleton model, while those left are regarded as an auxiliary model. The total depth of the skeleton model is quite shallow compared to the very deep full network. In our incremental training, the skeleton model is first trained with the full training data set. Other layers and connections belonging to the auxiliary model are gradually attached to the skeleton model and tuned. Our experiments showed that the proposed incremental training obtained comparable performances and faster training speed compared with the model training as a whole without consideration of the different importance of each layer.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incremental training and constructing the very deep convolutional residual network acoustic models\",\"authors\":\"Sheng Li, Xugang Lu, Peng Shen, R. Takashima, Tatsuya Kawahara, H. Kawai\",\"doi\":\"10.1109/ASRU.2017.8268939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inspired by the successful applications in image recognition, the very deep convolutional residual network (ResNet) based model has been applied in automatic speech recognition (ASR). However, the computational load is heavy for training the ResNet with a large quantity of data. In this paper, we propose an incremental model training framework to accelerate the training process of the ResNet. The incremental model training framework is based on the unequal importance of each layer and connection in the ResNet. The modules with important layers and connections are regarded as a skeleton model, while those left are regarded as an auxiliary model. The total depth of the skeleton model is quite shallow compared to the very deep full network. In our incremental training, the skeleton model is first trained with the full training data set. Other layers and connections belonging to the auxiliary model are gradually attached to the skeleton model and tuned. Our experiments showed that the proposed incremental training obtained comparable performances and faster training speed compared with the model training as a whole without consideration of the different importance of each layer.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

受图像识别成功应用的启发，基于深度卷积残差网络(ResNet)的模型已被应用于自动语音识别(ASR)。然而，对于训练具有大量数据的ResNet来说，计算量很大。在本文中，我们提出了一个增量模型训练框架来加速ResNet的训练过程。增量模型训练框架是基于ResNet中各层和连接的重要性不等。将具有重要层次和连接的模块视为骨架模型，其余的模块视为辅助模型。与非常深的全网络相比，骨架模型的总深度相当浅。在我们的增量训练中，骨架模型首先使用完整的训练数据集进行训练。属于辅助模型的其他层和连接逐渐附加到骨架模型并进行调整。我们的实验表明，与不考虑各层重要性不同的整体模型训练相比，所提出的增量训练获得了相当的性能和更快的训练速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Incremental training and constructing the very deep convolutional residual network acoustic models

Inspired by the successful applications in image recognition, the very deep convolutional residual network (ResNet) based model has been applied in automatic speech recognition (ASR). However, the computational load is heavy for training the ResNet with a large quantity of data. In this paper, we propose an incremental model training framework to accelerate the training process of the ResNet. The incremental model training framework is based on the unequal importance of each layer and connection in the ResNet. The modules with important layers and connections are regarded as a skeleton model, while those left are regarded as an auxiliary model. The total depth of the skeleton model is quite shallow compared to the very deep full network. In our incremental training, the skeleton model is first trained with the full training data set. Other layers and connections belonging to the auxiliary model are gradually attached to the skeleton model and tuned. Our experiments showed that the proposed incremental training obtained comparable performances and faster training speed compared with the model training as a whole without consideration of the different importance of each layer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量