Decomposition and composition of deep convolutional neural networks and training acceleration via sub-network transfer learning

ETNA - Electronic Transactions on Numerical Analysis Pub Date : 1900-01-01 DOI:10.1553/etna_vol56s157

Linyan Gu, Wei Zhang, Jia Liu, X. Cai

{"title":"Decomposition and composition of deep convolutional neural networks and training acceleration via sub-network transfer learning","authors":"Linyan Gu, Wei Zhang, Jia Liu, X. Cai","doi":"10.1553/etna_vol56s157","DOIUrl":null,"url":null,"abstract":"Deep convolutional neural network (DCNN) has led to significant breakthroughs in deep learning. However, larger models and larger datasets result in longer training times slowing down the development progress of deep learning. In this paper, following the idea of domain decomposition methods, we propose and study a new method to parallelize the training of DCNNs by decomposing and composing DCNNs. First, a global network is decomposed into several sub-networks by partitioning the width of the network (i.e., along the channel dimension) while keeping the depth constant. All the sub-networks are individually trained, in parallel without any interprocessor communication, with the corresponding decomposed samples from the input data. Then, following the idea of nonlinear preconditioning, we propose a sub-network transfer learning strategy in which the weights of the trained sub-networks are recomposed to initialize the global network, which is then trained to further adapt the parameters. Some theoretical analyses are provided to show the effectiveness of the sub-network transfer learning strategy. More precisely speaking, we prove that (1) the initialized global network can extract the feature maps learned by the sub-networks; (2) the initialization of the global network can provide an upper bound and a lower bound for the cost function and the classification accuracy with the corresponding values of the trained sub-networks. Some experiments are provided to evaluate the proposed methods. The results show that the sub-network transfer learning strategy can indeed provide good initialization and accelerate the training of the global network. Additionally, after further training, the transfer learning strategy shows almost no loss of accuracy and sometimes the accuracy is higher than if the network is initialized randomly.","PeriodicalId":282695,"journal":{"name":"ETNA - Electronic Transactions on Numerical Analysis","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETNA - Electronic Transactions on Numerical Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1553/etna_vol56s157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Deep convolutional neural network (DCNN) has led to significant breakthroughs in deep learning. However, larger models and larger datasets result in longer training times slowing down the development progress of deep learning. In this paper, following the idea of domain decomposition methods, we propose and study a new method to parallelize the training of DCNNs by decomposing and composing DCNNs. First, a global network is decomposed into several sub-networks by partitioning the width of the network (i.e., along the channel dimension) while keeping the depth constant. All the sub-networks are individually trained, in parallel without any interprocessor communication, with the corresponding decomposed samples from the input data. Then, following the idea of nonlinear preconditioning, we propose a sub-network transfer learning strategy in which the weights of the trained sub-networks are recomposed to initialize the global network, which is then trained to further adapt the parameters. Some theoretical analyses are provided to show the effectiveness of the sub-network transfer learning strategy. More precisely speaking, we prove that (1) the initialized global network can extract the feature maps learned by the sub-networks; (2) the initialization of the global network can provide an upper bound and a lower bound for the cost function and the classification accuracy with the corresponding values of the trained sub-networks. Some experiments are provided to evaluate the proposed methods. The results show that the sub-network transfer learning strategy can indeed provide good initialization and accelerate the training of the global network. Additionally, after further training, the transfer learning strategy shows almost no loss of accuracy and sometimes the accuracy is higher than if the network is initialized randomly.

查看原文本刊更多论文

深度卷积神经网络的分解与组成及基于子网络迁移学习的训练加速

深度卷积神经网络(Deep convolutional neural network, DCNN)引领了深度学习领域的重大突破。然而，更大的模型和更大的数据集导致更长的训练时间，减缓了深度学习的发展进程。本文遵循领域分解方法的思想，提出并研究了一种通过对DCNNs进行分解和组合来并行化训练DCNNs的新方法。首先，在保持深度不变的情况下，通过对网络宽度(即沿信道维度)进行划分，将全局网络分解为若干个子网络。所有的子网络都是单独训练的，并行的，没有任何处理器间的通信，并从输入数据中分解相应的样本。然后，根据非线性预处理的思想，提出了一种子网迁移学习策略，该策略通过重新组合已训练的子网的权值来初始化全局网络，然后对全局网络进行训练以进一步适应参数。通过理论分析，证明了子网络迁移学习策略的有效性。更准确地说，我们证明了:(1)初始化的全局网络可以提取子网络学习到的特征映射;(2)全局网络的初始化可以为代价函数和分类精度提供一个上界和下界，与训练的子网络的相应值对应。通过实验对所提出的方法进行了验证。结果表明，子网络迁移学习策略确实可以提供良好的初始化，并加速全局网络的训练。此外，经过进一步的训练，迁移学习策略的准确率几乎没有损失，有时甚至比随机初始化网络的准确率更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ETNA - Electronic Transactions on Numerical Analysis

自引率

0.00%

发文量