Learning the Group Structure of Deep Neural Networks with an Expectation Maximization Method

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI:10.1109/ICDMW.2018.00106

Subin Yi, Jaesik Choi

{"title":"Learning the Group Structure of Deep Neural Networks with an Expectation Maximization Method","authors":"Subin Yi, Jaesik Choi","doi":"10.1109/ICDMW.2018.00106","DOIUrl":null,"url":null,"abstract":"Many recent deep learning research work use very deep neural networks exploiting huge amount of parameters. It results in the strong expressive power, however, it also brings issues such as overfitting to training data, increasing memory burden and requiring excessive computations. In this paper, we propose an expectation maximization method to learn the group structure of deep neural networks with a group regularization principle to resolve those issues. Our method clusters the neurons in a layer based on how they are connected to the neurons in the next layer using a mixture model and the neurons in the next layer based on which group in the current layer they are most strongly connected to. Our expectation maximization method uses the Gaussian mixture model to keep the most salient connections and remove others to acquire a grouped weight matrix in a block diagonal matrix form. We refine our method further to cluster the kernels of convolutional neural networks (CNNs). We define the representative value of each kernel and build a representative matrix. The matrix is then grouped and the kernels are pruned out based on the group structure of the representative matrix. In experiments, we applied our method to fully-connected networks, 1-dimensional CNNs, and 2-dimensional CNNs and compared with baseline deep neural networks in MNIST, CIFAR-10, and United States groundwater datasets with respect to the number of parameters and classification and regression accuracy. We show that our method can reduce the number of parameters significantly without loss of accuracy and outperform the baseline models.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2018.00106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Many recent deep learning research work use very deep neural networks exploiting huge amount of parameters. It results in the strong expressive power, however, it also brings issues such as overfitting to training data, increasing memory burden and requiring excessive computations. In this paper, we propose an expectation maximization method to learn the group structure of deep neural networks with a group regularization principle to resolve those issues. Our method clusters the neurons in a layer based on how they are connected to the neurons in the next layer using a mixture model and the neurons in the next layer based on which group in the current layer they are most strongly connected to. Our expectation maximization method uses the Gaussian mixture model to keep the most salient connections and remove others to acquire a grouped weight matrix in a block diagonal matrix form. We refine our method further to cluster the kernels of convolutional neural networks (CNNs). We define the representative value of each kernel and build a representative matrix. The matrix is then grouped and the kernels are pruned out based on the group structure of the representative matrix. In experiments, we applied our method to fully-connected networks, 1-dimensional CNNs, and 2-dimensional CNNs and compared with baseline deep neural networks in MNIST, CIFAR-10, and United States groundwater datasets with respect to the number of parameters and classification and regression accuracy. We show that our method can reduce the number of parameters significantly without loss of accuracy and outperform the baseline models.

查看原文本刊更多论文

基于期望最大化方法的深度神经网络群结构学习

最近的许多深度学习研究工作都使用了深度神经网络，利用了大量的参数。它带来了较强的表达能力，但也带来了训练数据过拟合、增加内存负担、计算量过大等问题。本文提出了一种基于群正则化原理的期望最大化学习深度神经网络群结构的方法来解决这些问题。我们的方法使用混合模型，根据神经元与下一层神经元的连接方式对一层中的神经元进行聚类，并根据当前层中神经元与哪一组的连接最紧密来对下一层中的神经元进行聚类。我们的期望最大化方法使用高斯混合模型保留最显著的连接并去除其他连接，以块对角矩阵形式获得分组权重矩阵。我们进一步改进了我们的方法来聚类卷积神经网络(cnn)的核。我们定义了每个核的代表值，并建立了一个代表矩阵。然后对矩阵进行分组，并根据代表性矩阵的群结构对核进行剪枝。在实验中，我们将该方法应用于全连接网络、一维cnn和二维cnn，并与MNIST、CIFAR-10和美国地下水数据集中的基线深度神经网络在参数数量、分类和回归精度方面进行了比较。我们表明，我们的方法可以在不损失精度的情况下显著减少参数的数量，并且优于基线模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Conference on Data Mining Workshops (ICDMW)

自引率

0.00%

发文量