Leveraging Continuously Differentiable Activation for Learning in Analog and Quantized Noisy Environments

IF 4.3 2区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Vivswan Shah;Nathan Youngblood
{"title":"Leveraging Continuously Differentiable Activation for Learning in Analog and Quantized Noisy Environments","authors":"Vivswan Shah;Nathan Youngblood","doi":"10.1109/JSTQE.2025.3534636","DOIUrl":null,"url":null,"abstract":"Real-world analog systems, such as photonic neural networks, intrinsically suffer from noise that can impede model convergence and accuracy for a variety of deep learning models. In the presence of noise, some activation functions behave erratically or even amplify the noise. Specifically, ReLU, an activation function used ubiquitously in digital deep learning systems, not only poses a challenge to implement in analog hardware but has also been shown to perform worse than continuously differentiable activation functions. In this paper, we demonstrate that GELU and SiLU enable robust propagation of gradients in analog hardware because they are continuously differentiable functions. To analyze this cause of activation differences in the presence of noise, we used functional interpolation between ReLU and GELU/SiLU to perform analysis and training of convolutional, linear, and transformer networks on simulated analog hardware with different interpolated activation functions. We find that in ReLU, errors in the gradient due to noise are amplified during backpropagation, leading to a significant reduction in model performance. However, we observe that error amplification decreases as we move toward GELU/SiLU, until it is non-existent at GELU/SiLU demonstrating that continuously differentiable activation functions are <inline-formula><tex-math>$\\sim 100\\times$</tex-math></inline-formula> more noise-resistant than conventional rectified activations for inputs near zero. Our findings provide guidance in selecting the appropriate activations to realize reliable and performant photonic and other analog hardware accelerators in several domains of machine learning, such as computer vision, signal processing, and beyond.","PeriodicalId":13094,"journal":{"name":"IEEE Journal of Selected Topics in Quantum Electronics","volume":"31 3: AI/ML Integrated Opto-electronics","pages":"1-9"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Quantum Electronics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10854609/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Real-world analog systems, such as photonic neural networks, intrinsically suffer from noise that can impede model convergence and accuracy for a variety of deep learning models. In the presence of noise, some activation functions behave erratically or even amplify the noise. Specifically, ReLU, an activation function used ubiquitously in digital deep learning systems, not only poses a challenge to implement in analog hardware but has also been shown to perform worse than continuously differentiable activation functions. In this paper, we demonstrate that GELU and SiLU enable robust propagation of gradients in analog hardware because they are continuously differentiable functions. To analyze this cause of activation differences in the presence of noise, we used functional interpolation between ReLU and GELU/SiLU to perform analysis and training of convolutional, linear, and transformer networks on simulated analog hardware with different interpolated activation functions. We find that in ReLU, errors in the gradient due to noise are amplified during backpropagation, leading to a significant reduction in model performance. However, we observe that error amplification decreases as we move toward GELU/SiLU, until it is non-existent at GELU/SiLU demonstrating that continuously differentiable activation functions are $\sim 100\times$ more noise-resistant than conventional rectified activations for inputs near zero. Our findings provide guidance in selecting the appropriate activations to realize reliable and performant photonic and other analog hardware accelerators in several domains of machine learning, such as computer vision, signal processing, and beyond.
利用连续可微激活在模拟和量化噪声环境中学习
现实世界的模拟系统,如光子神经网络,本质上受到噪声的影响,噪声会阻碍各种深度学习模型的收敛和准确性。在噪声存在的情况下,一些激活函数的行为不规律,甚至会放大噪声。具体来说,数字深度学习系统中普遍使用的激活函数ReLU不仅对模拟硬件的实现提出了挑战,而且还被证明比连续可微激活函数表现更差。在本文中,我们证明了GELU和SiLU能够在模拟硬件中实现梯度的鲁棒传播,因为它们是连续可微函数。为了分析噪声存在下激活差异的原因,我们使用ReLU和GELU/SiLU之间的函数插值,在具有不同插值激活函数的模拟模拟硬件上对卷积、线性和变压器网络进行分析和训练。我们发现,在ReLU中,由于噪声引起的梯度误差在反向传播过程中被放大,导致模型性能显著降低。然而,我们观察到误差放大随着我们向GELU/SiLU移动而减小,直到在GELU/SiLU不存在为止,这表明连续可微激活函数比传统整流激活在输入接近零时的抗噪声性高100倍。我们的研究结果为在机器学习的几个领域(如计算机视觉、信号处理等)选择合适的激活来实现可靠和高性能的光子和其他模拟硬件加速器提供了指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Journal of Selected Topics in Quantum Electronics
IEEE Journal of Selected Topics in Quantum Electronics 工程技术-工程:电子与电气
CiteScore
10.60
自引率
2.00%
发文量
212
审稿时长
3 months
期刊介绍: Papers published in the IEEE Journal of Selected Topics in Quantum Electronics fall within the broad field of science and technology of quantum electronics of a device, subsystem, or system-oriented nature. Each issue is devoted to a specific topic within this broad spectrum. Announcements of the topical areas planned for future issues, along with deadlines for receipt of manuscripts, are published in this Journal and in the IEEE Journal of Quantum Electronics. Generally, the scope of manuscripts appropriate to this Journal is the same as that for the IEEE Journal of Quantum Electronics. Manuscripts are published that report original theoretical and/or experimental research results that advance the scientific and technological base of quantum electronics devices, systems, or applications. The Journal is dedicated toward publishing research results that advance the state of the art or add to the understanding of the generation, amplification, modulation, detection, waveguiding, or propagation characteristics of coherent electromagnetic radiation having sub-millimeter and shorter wavelengths. In order to be suitable for publication in this Journal, the content of manuscripts concerned with subject-related research must have a potential impact on advancing the technological base of quantum electronic devices, systems, and/or applications. Potential authors of subject-related research have the responsibility of pointing out this potential impact. System-oriented manuscripts must be concerned with systems that perform a function previously unavailable or that outperform previously established systems that did not use quantum electronic components or concepts. Tutorial and review papers are by invitation only.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信