振荡激活函数可以提高卷积神经网络的性能

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-04-11 DOI:10.1016/j.asoc.2025.113077

Mathew Mithra Noel , Arunkumar L. , Advait Trivedi , Praneet Dutta

{"title":"振荡激活函数可以提高卷积神经网络的性能","authors":"Mathew Mithra Noel , Arunkumar L. , Advait Trivedi , Praneet Dutta","doi":"10.1016/j.asoc.2025.113077","DOIUrl":null,"url":null,"abstract":"<div><div>Convolutional neural networks have been successful in solving many socially important and economically significant problems. Their ability to learn complex high-dimensional functions hierarchically can be attributed to the use of nonlinear activation functions. A key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function to alleviate the vanishing gradient problem caused by using saturating activation functions. Since then, many improved variants of the ReLU activation have been proposed. However, a majority of activation functions used today are non-oscillatory and monotonically increasing due to their biological plausibility. This paper demonstrates that oscillatory activation functions can improve gradient flow and reduce network size. Two theorems on limits of non-oscillatory activation functions are presented. A new oscillatory activation function called Growing Cosine Unit(GCU) defined as <span><math><mrow><mi>C</mi><mrow><mo>(</mo><mi>z</mi><mo>)</mo></mrow><mo>=</mo><mi>z</mi><mi>⋅</mi><mo>cos</mo><mi>z</mi></mrow></math></span> that outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures and benchmarks is presented. The GCU activation has multiple zeros enabling single GCU neurons to have multiple hyperplanes in the decision boundary. This allows single GCU neurons to learn the XOR function without feature engineering. Extensive experimental comparison with 16 popular activation functions indicate that the GCU activation function significantly improves performance on CIFAR-10, CIFAR-100, Imagenette and the 1000 class ImageNet benchmarks.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"175 ","pages":"Article 113077"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Oscillating activation functions can improve the performance of convolutional neural networks\",\"authors\":\"Mathew Mithra Noel , Arunkumar L. , Advait Trivedi , Praneet Dutta\",\"doi\":\"10.1016/j.asoc.2025.113077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Convolutional neural networks have been successful in solving many socially important and economically significant problems. Their ability to learn complex high-dimensional functions hierarchically can be attributed to the use of nonlinear activation functions. A key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function to alleviate the vanishing gradient problem caused by using saturating activation functions. Since then, many improved variants of the ReLU activation have been proposed. However, a majority of activation functions used today are non-oscillatory and monotonically increasing due to their biological plausibility. This paper demonstrates that oscillatory activation functions can improve gradient flow and reduce network size. Two theorems on limits of non-oscillatory activation functions are presented. A new oscillatory activation function called Growing Cosine Unit(GCU) defined as <span><math><mrow><mi>C</mi><mrow><mo>(</mo><mi>z</mi><mo>)</mo></mrow><mo>=</mo><mi>z</mi><mi>⋅</mi><mo>cos</mo><mi>z</mi></mrow></math></span> that outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures and benchmarks is presented. The GCU activation has multiple zeros enabling single GCU neurons to have multiple hyperplanes in the decision boundary. This allows single GCU neurons to learn the XOR function without feature engineering. Extensive experimental comparison with 16 popular activation functions indicate that the GCU activation function significantly improves performance on CIFAR-10, CIFAR-100, Imagenette and the 1000 class ImageNet benchmarks.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"175 \",\"pages\":\"Article 113077\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625003886\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625003886","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络已经成功地解决了许多具有社会意义和经济意义的问题。它们能够分层学习复杂高维函数的能力可归因于非线性激活函数的使用。使训练深度网络变得可行的一个关键发现是采用了整流线性单元（ReLU）激活函数来缓解使用饱和激活函数引起的梯度消失问题。从那时起，人们提出了许多改进的ReLU激活变体。然而，由于其生物学合理性，目前使用的大多数激活函数是非振荡和单调增加的。本文论证了振荡激活函数可以改善梯度流，减小网络大小。给出了关于非振荡激活函数极限的两个定理。提出了一种新的振荡激活函数，称为生长余弦单元（growth Cosine Unit， GCU），定义为C(z)=z⋅cosz，在各种架构和基准测试中优于Sigmoids， Swish， Mish和ReLU。GCU激活具有多个零，使得单个GCU神经元在决策边界上具有多个超平面。这使得单个GCU神经元无需特征工程就可以学习异或函数。与16种常用激活函数的广泛实验比较表明，GCU激活函数显著提高了CIFAR-10、CIFAR-100、Imagenette和1000类ImageNet基准测试的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Oscillating activation functions can improve the performance of convolutional neural networks

Convolutional neural networks have been successful in solving many socially important and economically significant problems. Their ability to learn complex high-dimensional functions hierarchically can be attributed to the use of nonlinear activation functions. A key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function to alleviate the vanishing gradient problem caused by using saturating activation functions. Since then, many improved variants of the ReLU activation have been proposed. However, a majority of activation functions used today are non-oscillatory and monotonically increasing due to their biological plausibility. This paper demonstrates that oscillatory activation functions can improve gradient flow and reduce network size. Two theorems on limits of non-oscillatory activation functions are presented. A new oscillatory activation function called Growing Cosine Unit(GCU) defined as

C (z) = z \cdot cos z

that outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures and benchmarks is presented. The GCU activation has multiple zeros enabling single GCU neurons to have multiple hyperplanes in the decision boundary. This allows single GCU neurons to learn the XOR function without feature engineering. Extensive experimental comparison with 16 popular activation functions indicate that the GCU activation function significantly improves performance on CIFAR-10, CIFAR-100, Imagenette and the 1000 class ImageNet benchmarks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.