Highway Network Block with Gates Constraints for Training Very Deep Networks

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2018-06-01 DOI:10.1109/CVPRW.2018.00217

O. Oyedotun, Abd El Rahman Shabayek, Djamila Aouada, B. Ottersten

{"title":"Highway Network Block with Gates Constraints for Training Very Deep Networks","authors":"O. Oyedotun, Abd El Rahman Shabayek, Djamila Aouada, B. Ottersten","doi":"10.1109/CVPRW.2018.00217","DOIUrl":null,"url":null,"abstract":"In this paper, we propose to reformulate the learning of the highway network block to realize both early optimization and improved generalization of very deep networks while preserving the network depth. Gate constraints are duly employed to improve optimization, latent representations and parameterization usage in order to efficiently learn hierarchical feature transformations which are crucial for the success of any deep network. One of the earliest very deep models with over 30 layers that was successfully trained relied on highway network blocks. Although, highway blocks suffice for alleviating optimization problem via improved information flow, we show for the first time that further in training such highway blocks may result into learning mostly untransformed features and therefore a reduction in the effective depth of the model; this could negatively impact model generalization performance. Using the proposed approach, 15-layer and 20-layer models are successfully trained with one gate and a 32-layer model using three gates. This leads to a drastic reduction of model parameters as compared to the original highway network. Extensive experiments on CIFAR-10, CIFAR-100, Fashion-MNIST and USPS datasets are performed to validate the effectiveness of the proposed approach. Particularly, we outperform the original highway network and many state-of-the-art results. To the best our knowledge, on the Fashion-MNIST and USPS datasets, the achieved results are the best reported in literature.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2018.00217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

In this paper, we propose to reformulate the learning of the highway network block to realize both early optimization and improved generalization of very deep networks while preserving the network depth. Gate constraints are duly employed to improve optimization, latent representations and parameterization usage in order to efficiently learn hierarchical feature transformations which are crucial for the success of any deep network. One of the earliest very deep models with over 30 layers that was successfully trained relied on highway network blocks. Although, highway blocks suffice for alleviating optimization problem via improved information flow, we show for the first time that further in training such highway blocks may result into learning mostly untransformed features and therefore a reduction in the effective depth of the model; this could negatively impact model generalization performance. Using the proposed approach, 15-layer and 20-layer models are successfully trained with one gate and a 32-layer model using three gates. This leads to a drastic reduction of model parameters as compared to the original highway network. Extensive experiments on CIFAR-10, CIFAR-100, Fashion-MNIST and USPS datasets are performed to validate the effectiveness of the proposed approach. Particularly, we outperform the original highway network and many state-of-the-art results. To the best our knowledge, on the Fashion-MNIST and USPS datasets, the achieved results are the best reported in literature.

查看原文本刊更多论文

用于训练极深网络的带门约束的公路网块

在本文中，我们提出重新制定公路网块的学习，以实现非常深网络的早期优化和改进泛化，同时保持网络深度。适当地使用门约束来改进优化，潜在表示和参数化使用，以便有效地学习对任何深度网络成功至关重要的分层特征转换。最早的一个非常深的模型有30多个层，它是成功训练的，依赖于公路网络块。尽管高速公路块足以通过改进信息流来缓解优化问题，但我们首次表明，进一步训练这种高速公路块可能会导致学习大部分未转换的特征，从而降低模型的有效深度;这可能会对模型泛化性能产生负面影响。使用该方法，15层和20层模型分别使用一个栅极训练成功，32层模型使用三个栅极训练成功。与原来的公路网相比，这导致了模型参数的急剧减少。在CIFAR-10、CIFAR-100、Fashion-MNIST和USPS数据集上进行了大量实验，以验证所提出方法的有效性。特别是，我们比原来的高速公路网和许多最先进的成果。据我们所知，在Fashion-MNIST和USPS数据集上，取得的结果是文献中最好的报告。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量