Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2018-12-01 DOI:10.1109/ICMLA.2018.00054

A. Yaguchi, Taiji Suzuki, Wataru Asano, Shuhei Nitta, Y. Sakata, A. Tanizawa

{"title":"Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks","authors":"A. Yaguchi, Taiji Suzuki, Wataru Asano, Shuhei Nitta, Y. Sakata, A. Tanizawa","doi":"10.1109/ICMLA.2018.00054","DOIUrl":null,"url":null,"abstract":"In recent years, deep neural networks (DNNs) have been applied to various machine leaning tasks, including image recognition, speech recognition, and machine translation. However, large DNN models are needed to achieve state-of-the-art performance, exceeding the capabilities of edge devices. Model reduction is thus needed for practical use. In this paper, we point out that deep learning automatically induces group sparsity of weights, in which all weights connected to an output channel (node) are zero, when training DNNs under the following three conditions: (1) rectified-linear-unit (ReLU) activations, (2) an L2-regularized objective function, and (3) the Adam optimizer. Next, we analyze this behavior both theoretically and experimentally, and propose a simple model reduction method: eliminate the zero weights after training the DNN. In experiments on MNIST and CIFAR-10 datasets, we demonstrate the sparsity with various training setups. Finally, we show that our method can efficiently reduce the model size and performs well relative to methods that use a sparsity-inducing regularizer.","PeriodicalId":6533,"journal":{"name":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"37 1","pages":"318-325"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2018.00054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

In recent years, deep neural networks (DNNs) have been applied to various machine leaning tasks, including image recognition, speech recognition, and machine translation. However, large DNN models are needed to achieve state-of-the-art performance, exceeding the capabilities of edge devices. Model reduction is thus needed for practical use. In this paper, we point out that deep learning automatically induces group sparsity of weights, in which all weights connected to an output channel (node) are zero, when training DNNs under the following three conditions: (1) rectified-linear-unit (ReLU) activations, (2) an L2-regularized objective function, and (3) the Adam optimizer. Next, we analyze this behavior both theoretically and experimentally, and propose a simple model reduction method: eliminate the zero weights after training the DNN. In experiments on MNIST and CIFAR-10 datasets, we demonstrate the sparsity with various training setups. Finally, we show that our method can efficiently reduce the model size and performs well relative to methods that use a sparsity-inducing regularizer.

查看原文本刊更多论文

Adam在整流神经网络中引入隐式权稀疏性

近年来，深度神经网络(dnn)已被应用于各种机器学习任务，包括图像识别、语音识别和机器翻译。然而，需要大型深度神经网络模型来实现最先进的性能，超过边缘设备的能力。因此，在实际应用中需要模型简化。在本文中，我们指出深度学习在以下三种条件下训练dnn(1)整流线性单元(ReLU)激活，(2)l2正则化目标函数，(3)Adam优化器时，自动诱导权值的组稀疏性，其中连接到输出通道(节点)的所有权值为零。接下来，我们从理论和实验两方面分析了这种行为，并提出了一种简单的模型约简方法:在训练DNN后消除零权值。在MNIST和CIFAR-10数据集的实验中，我们展示了不同训练设置的稀疏性。最后，我们证明了我们的方法可以有效地减小模型大小，并且相对于使用稀疏性诱导正则化器的方法表现良好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量