{"title":"Effect of continuous S-shaped rectified linear function on deep convolutional neural network","authors":"Anahita Ghazvini, Siti Norul Huda Sheikh Abdullah, Masri Ayob","doi":"10.1007/s10489-025-06399-0","DOIUrl":null,"url":null,"abstract":"<p>The vanishing gradient issue in convolutional neural networks (CNNs) is often addressed by improving activation functions, such as the S-shaped rectified linear activation unit (SReLU). However, SReLU can pose challenges in updating training parameters effectively. To mitigate this, we propose applying the Aggregation Fischer–Burmeister (AFB) function to SReLU, which smooths the secant line slope of the function from both sides. However, direct application of AFB to SReLU can intensify the vanishing gradient issue due to irregular function behavior. To address this concern, we introduce a regulated version of AFB (ReAFB) that ensures proper gradient and mean activation output conditions when applied to SReLU (ReAFBSReLU). We evaluate the performance of CNNs using ReAFBSReLU on three benchmark datasets: MNIST, CIFAR-10 (with and without data augmentation), and CIFAR-100. Specifically, we implement Network in Network (NIN) for MNIST and CIFAR-10, and LeNet for CIFAR-100 dataset. Additionally, we utilize SqueezeNet exclusively to compare the performance of CNNs using the proposed ReAFBSReLU activation function against state-of-the-art activation functions. Our results demonstrate that ReAFBSReLU outperforms other activation functions tested in this study, indicating its efficacy in enhancing training parameter updates and subsequently improving accuracy.</p>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06399-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The vanishing gradient issue in convolutional neural networks (CNNs) is often addressed by improving activation functions, such as the S-shaped rectified linear activation unit (SReLU). However, SReLU can pose challenges in updating training parameters effectively. To mitigate this, we propose applying the Aggregation Fischer–Burmeister (AFB) function to SReLU, which smooths the secant line slope of the function from both sides. However, direct application of AFB to SReLU can intensify the vanishing gradient issue due to irregular function behavior. To address this concern, we introduce a regulated version of AFB (ReAFB) that ensures proper gradient and mean activation output conditions when applied to SReLU (ReAFBSReLU). We evaluate the performance of CNNs using ReAFBSReLU on three benchmark datasets: MNIST, CIFAR-10 (with and without data augmentation), and CIFAR-100. Specifically, we implement Network in Network (NIN) for MNIST and CIFAR-10, and LeNet for CIFAR-100 dataset. Additionally, we utilize SqueezeNet exclusively to compare the performance of CNNs using the proposed ReAFBSReLU activation function against state-of-the-art activation functions. Our results demonstrate that ReAFBSReLU outperforms other activation functions tested in this study, indicating its efficacy in enhancing training parameter updates and subsequently improving accuracy.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.