A Scale Invariant Measure of Flatness for Deep Network Minima

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI:10.1109/ICASSP39728.2021.9413771

Akshay Rangamani, Nam H. Nguyen, Abhishek Kumar, D. Phan, S. Chin, T. Tran

{"title":"A Scale Invariant Measure of Flatness for Deep Network Minima","authors":"Akshay Rangamani, Nam H. Nguyen, Abhishek Kumar, D. Phan, S. Chin, T. Tran","doi":"10.1109/ICASSP39728.2021.9413771","DOIUrl":null,"url":null,"abstract":"It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of flatness are not invariant to rescaling of the network parameters. This means that the measure of flatness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using an appropriate Riemannian metric, we propose a Hessian-based measure for flatness that is invariant to rescaling and perform simulations to empirically verify our claim. Finally we perform experiments to verify that our flatness measure correlates with generalization by using minibatch stochastic gradient descent with different batch sizes to find deep network minima with different generalization properties.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP39728.2021.9413771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of flatness are not invariant to rescaling of the network parameters. This means that the measure of flatness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using an appropriate Riemannian metric, we propose a Hessian-based measure for flatness that is invariant to rescaling and perform simulations to empirically verify our claim. Finally we perform experiments to verify that our flatness measure correlates with generalization by using minibatch stochastic gradient descent with different batch sizes to find deep network minima with different generalization properties.

查看原文本刊更多论文

深度网络最小值平坦度的尺度不变度量

从经验上观察到，从训练深度网络获得的最小值的平坦度似乎与更好的泛化有关。然而，对于具有正均匀激活的深度网络，大多数平坦度度量对网络参数的重新缩放不是不变的。这意味着可以通过重新缩放使平面度的度量尽可能小或尽可能大，从而使定量度量变得毫无意义。在本文中，我们证明了对于具有正齐次激活的深度网络，这些重标构成等价关系，并且这些等价关系在参数空间中推导出商流形结构。使用适当的黎曼度量，我们提出了一种基于黑森的平坦度度量，该度量对重新缩放是不变的，并进行模拟以经验验证我们的主张。最后，我们通过实验验证了我们的平坦度度量与泛化的相关性，使用不同批大小的小批随机梯度下降来寻找具有不同泛化性质的深度网络最小值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量