大规模自规范化神经网络

Journal of Automation and Intelligence Pub Date : 2024-06-01 DOI:10.1016/j.jai.2024.05.001

Zhaodong Chen , Weiqin Zhao , Lei Deng , Yufei Ding , Qinghao Wen , Guoqi Li , Yuan Xie

{"title":"大规模自规范化神经网络","authors":"Zhaodong Chen , Weiqin Zhao , Lei Deng , Yufei Ding , Qinghao Wen , Guoqi Li , Yuan Xie","doi":"10.1016/j.jai.2024.05.001","DOIUrl":null,"url":null,"abstract":"<div><p>Self-normalizing neural networks (SNN) regulate the activation and gradient flows through activation functions with the self-normalization property. As SNNs do not rely on norms computed from minibatches, they are more friendly to data parallelism, kernel fusion, and emerging architectures such as ReRAM-based accelerators. However, existing SNNs have mainly demonstrated their effectiveness on toy datasets and fall short in accuracy when dealing with large-scale tasks like ImageNet. They lack the strong normalization, regularization, and expression power required for wider, deeper models and larger-scale tasks. To enhance the normalization strength, this paper introduces a comprehensive and practical definition of the self-normalization property in terms of the stability and attractiveness of the statistical fixed points. It is comprehensive as it jointly considers all the fixed points used by existing studies: the first and second moment of forward activation and the expected Frobenius norm of backward gradient. The practicality comes from the analytical equations provided by our paper to assess the stability and attractiveness of each fixed point, which are derived from theoretical analysis of the forward and backward signals. The proposed definition is applied to a meta activation function inspired by prior research, leading to a stronger self-normalizing activation function named “bi-scaled exponential linear unit with backward standardized” (bSELU-BSTD). We provide both theoretical and empirical evidence to show that it is superior to existing studies. To enhance the regularization and expression power, we further propose scaled-Mixup and channel-wise scale & shift. With these three techniques, our approach achieves <strong>75.23%</strong> top-1 accuracy on the ImageNet with Conv MobileNet V1, surpassing the performance of existing self-normalizing activation functions. To the best of our knowledge, this is the first SNN that achieves comparable accuracy to batch normalization on ImageNet.</p></div>","PeriodicalId":100755,"journal":{"name":"Journal of Automation and Intelligence","volume":"3 2","pages":"Pages 101-110"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949855424000194/pdfft?md5=336e35816fb9708e5d5bbfc6ba5ac066&pid=1-s2.0-S2949855424000194-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Large-scale self-normalizing neural networks\",\"authors\":\"Zhaodong Chen , Weiqin Zhao , Lei Deng , Yufei Ding , Qinghao Wen , Guoqi Li , Yuan Xie\",\"doi\":\"10.1016/j.jai.2024.05.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Self-normalizing neural networks (SNN) regulate the activation and gradient flows through activation functions with the self-normalization property. As SNNs do not rely on norms computed from minibatches, they are more friendly to data parallelism, kernel fusion, and emerging architectures such as ReRAM-based accelerators. However, existing SNNs have mainly demonstrated their effectiveness on toy datasets and fall short in accuracy when dealing with large-scale tasks like ImageNet. They lack the strong normalization, regularization, and expression power required for wider, deeper models and larger-scale tasks. To enhance the normalization strength, this paper introduces a comprehensive and practical definition of the self-normalization property in terms of the stability and attractiveness of the statistical fixed points. It is comprehensive as it jointly considers all the fixed points used by existing studies: the first and second moment of forward activation and the expected Frobenius norm of backward gradient. The practicality comes from the analytical equations provided by our paper to assess the stability and attractiveness of each fixed point, which are derived from theoretical analysis of the forward and backward signals. The proposed definition is applied to a meta activation function inspired by prior research, leading to a stronger self-normalizing activation function named “bi-scaled exponential linear unit with backward standardized” (bSELU-BSTD). We provide both theoretical and empirical evidence to show that it is superior to existing studies. To enhance the regularization and expression power, we further propose scaled-Mixup and channel-wise scale & shift. With these three techniques, our approach achieves <strong>75.23%</strong> top-1 accuracy on the ImageNet with Conv MobileNet V1, surpassing the performance of existing self-normalizing activation functions. To the best of our knowledge, this is the first SNN that achieves comparable accuracy to batch normalization on ImageNet.</p></div>\",\"PeriodicalId\":100755,\"journal\":{\"name\":\"Journal of Automation and Intelligence\",\"volume\":\"3 2\",\"pages\":\"Pages 101-110\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949855424000194/pdfft?md5=336e35816fb9708e5d5bbfc6ba5ac066&pid=1-s2.0-S2949855424000194-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Automation and Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949855424000194\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Automation and Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949855424000194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自归一化神经网络（SNN）通过具有自归一化特性的激活函数来调节激活和梯度流。由于自归一化神经网络不依赖于通过小批量计算的规范，因此对数据并行性、内核融合以及基于 ReRAM 的加速器等新兴架构更为友好。然而，现有的 SNN 主要在玩具数据集上展示了其有效性，在处理 ImageNet 等大规模任务时，其准确性有所欠缺。它们缺乏更广泛、更深入的模型和更大规模任务所需的强大规范化、正则化和表达能力。为了增强正则化强度，本文从统计定点的稳定性和吸引力的角度，对自正则化属性进行了全面而实用的定义。说它全面，是因为它联合考虑了现有研究中使用的所有固定点：前向激活的第一和第二矩以及后向梯度的预期弗罗贝尼斯准则。其实用性来自于本文提供的用于评估每个固定点稳定性和吸引力的分析方程，这些方程来自于对前向和后向信号的理论分析。受先前研究的启发，我们将所提出的定义应用于元激活函数，从而得到了一种更强的自归一化激活函数，命名为 "具有后向标准化的双标度指数线性单元"（bSELU-BSTD）。我们提供了理论和实证证据，证明它优于现有研究。为了增强正则化和表达能力，我们进一步提出了缩放混合（scaled-Mixup）和信道缩放& 移位（shift）技术。通过这三种技术，我们的方法在使用 Conv MobileNet V1 的 ImageNet 上取得了 75.23% 的第一名准确率，超过了现有自规范化激活函数的性能。据我们所知，这是第一个在 ImageNet 上达到与批量归一化相当准确度的 SNN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Large-scale self-normalizing neural networks

Self-normalizing neural networks (SNN) regulate the activation and gradient flows through activation functions with the self-normalization property. As SNNs do not rely on norms computed from minibatches, they are more friendly to data parallelism, kernel fusion, and emerging architectures such as ReRAM-based accelerators. However, existing SNNs have mainly demonstrated their effectiveness on toy datasets and fall short in accuracy when dealing with large-scale tasks like ImageNet. They lack the strong normalization, regularization, and expression power required for wider, deeper models and larger-scale tasks. To enhance the normalization strength, this paper introduces a comprehensive and practical definition of the self-normalization property in terms of the stability and attractiveness of the statistical fixed points. It is comprehensive as it jointly considers all the fixed points used by existing studies: the first and second moment of forward activation and the expected Frobenius norm of backward gradient. The practicality comes from the analytical equations provided by our paper to assess the stability and attractiveness of each fixed point, which are derived from theoretical analysis of the forward and backward signals. The proposed definition is applied to a meta activation function inspired by prior research, leading to a stronger self-normalizing activation function named “bi-scaled exponential linear unit with backward standardized” (bSELU-BSTD). We provide both theoretical and empirical evidence to show that it is superior to existing studies. To enhance the regularization and expression power, we further propose scaled-Mixup and channel-wise scale & shift. With these three techniques, our approach achieves 75.23% top-1 accuracy on the ImageNet with Conv MobileNet V1, surpassing the performance of existing self-normalizing activation functions. To the best of our knowledge, this is the first SNN that achieves comparable accuracy to batch normalization on ImageNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Automation and Intelligence

自引率

0.00%

发文量