BatchNorm中位移和尺度参数的实证分析

Inf. Sci. Pub Date : 2023-03-22 DOI:10.48550/arXiv.2303.12818

Y. Peerthum, M. Stamp

{"title":"BatchNorm中位移和尺度参数的实证分析","authors":"Y. Peerthum, M. Stamp","doi":"10.48550/arXiv.2303.12818","DOIUrl":null,"url":null,"abstract":"Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks, especially Convolutional Neural Networks (CNN). It has been empirically demonstrated that BatchNorm increases performance, stability, and accuracy, although the reasons for such improvements are unclear. BatchNorm includes a normalization step as well as trainable shift and scale parameters. In this paper, we empirically examine the relative contribution to the success of BatchNorm of the normalization step, as compared to the re-parameterization via shifting and scaling. To conduct our experiments, we implement two new optimizers in PyTorch, namely, a version of BatchNorm that we refer to as AffineLayer, which includes the re-parameterization step without normalization, and a version with just the normalization step, that we call BatchNorm-minus. We compare the performance of our AffineLayer and BatchNorm-minus implementations to standard BatchNorm, and we also compare these to the case where no batch normalization is used. We experiment with four ResNet architectures (ResNet18, ResNet34, ResNet50, and ResNet101) over a standard image dataset and multiple batch sizes. Among other findings, we provide empirical evidence that the success of BatchNorm may derive primarily from improved weight initialization.","PeriodicalId":13641,"journal":{"name":"Inf. Sci.","volume":"104 1","pages":"118951"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Empirical Analysis of the Shift and Scale Parameters in BatchNorm\",\"authors\":\"Y. Peerthum, M. Stamp\",\"doi\":\"10.48550/arXiv.2303.12818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks, especially Convolutional Neural Networks (CNN). It has been empirically demonstrated that BatchNorm increases performance, stability, and accuracy, although the reasons for such improvements are unclear. BatchNorm includes a normalization step as well as trainable shift and scale parameters. In this paper, we empirically examine the relative contribution to the success of BatchNorm of the normalization step, as compared to the re-parameterization via shifting and scaling. To conduct our experiments, we implement two new optimizers in PyTorch, namely, a version of BatchNorm that we refer to as AffineLayer, which includes the re-parameterization step without normalization, and a version with just the normalization step, that we call BatchNorm-minus. We compare the performance of our AffineLayer and BatchNorm-minus implementations to standard BatchNorm, and we also compare these to the case where no batch normalization is used. We experiment with four ResNet architectures (ResNet18, ResNet34, ResNet50, and ResNet101) over a standard image dataset and multiple batch sizes. Among other findings, we provide empirical evidence that the success of BatchNorm may derive primarily from improved weight initialization.\",\"PeriodicalId\":13641,\"journal\":{\"name\":\"Inf. Sci.\",\"volume\":\"104 1\",\"pages\":\"118951\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Inf. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2303.12818\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inf. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2303.12818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

批归一化(BatchNorm)是一种改进深度神经网络，特别是卷积神经网络(CNN)训练的技术。经验证明，BatchNorm提高了性能、稳定性和准确性，尽管这些改进的原因尚不清楚。BatchNorm包括一个规范化步骤以及可训练的移位和缩放参数。在本文中，我们通过经验检验了标准化步骤对BatchNorm成功的相对贡献，与通过移动和缩放的重新参数化相比。为了进行我们的实验，我们在PyTorch中实现了两个新的优化器，即一个版本的BatchNorm，我们称之为AffineLayer，它包括没有规范化的重新参数化步骤，一个版本只有规范化步骤，我们称之为BatchNorm-minus。我们将AffineLayer和BatchNorm-minus实现的性能与标准BatchNorm进行比较，并将它们与不使用批处理规范化的情况进行比较。我们在标准图像数据集和多个批处理大小上实验了四种ResNet架构(ResNet18, ResNet34, ResNet50和ResNet101)。在其他发现中，我们提供了经验证据，证明BatchNorm的成功可能主要来自改进的权重初始化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Empirical Analysis of the Shift and Scale Parameters in BatchNorm

Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks, especially Convolutional Neural Networks (CNN). It has been empirically demonstrated that BatchNorm increases performance, stability, and accuracy, although the reasons for such improvements are unclear. BatchNorm includes a normalization step as well as trainable shift and scale parameters. In this paper, we empirically examine the relative contribution to the success of BatchNorm of the normalization step, as compared to the re-parameterization via shifting and scaling. To conduct our experiments, we implement two new optimizers in PyTorch, namely, a version of BatchNorm that we refer to as AffineLayer, which includes the re-parameterization step without normalization, and a version with just the normalization step, that we call BatchNorm-minus. We compare the performance of our AffineLayer and BatchNorm-minus implementations to standard BatchNorm, and we also compare these to the case where no batch normalization is used. We experiment with four ResNet architectures (ResNet18, ResNet34, ResNet50, and ResNet101) over a standard image dataset and multiple batch sizes. Among other findings, we provide empirical evidence that the success of BatchNorm may derive primarily from improved weight initialization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Inf. Sci.

自引率

0.00%

发文量