Leveraging Continuously Differentiable Activation for Learning in Analog and Quantized Noisy Environments

IF 5.1 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Quantum Electronics Pub Date : 2025-01-27 DOI:10.1109/JSTQE.2025.3534636

Vivswan Shah;Nathan Youngblood

{"title":"Leveraging Continuously Differentiable Activation for Learning in Analog and Quantized Noisy Environments","authors":"Vivswan Shah;Nathan Youngblood","doi":"10.1109/JSTQE.2025.3534636","DOIUrl":null,"url":null,"abstract":"Real-world analog systems, such as photonic neural networks, intrinsically suffer from noise that can impede model convergence and accuracy for a variety of deep learning models. In the presence of noise, some activation functions behave erratically or even amplify the noise. Specifically, ReLU, an activation function used ubiquitously in digital deep learning systems, not only poses a challenge to implement in analog hardware but has also been shown to perform worse than continuously differentiable activation functions. In this paper, we demonstrate that GELU and SiLU enable robust propagation of gradients in analog hardware because they are continuously differentiable functions. To analyze this cause of activation differences in the presence of noise, we used functional interpolation between ReLU and GELU/SiLU to perform analysis and training of convolutional, linear, and transformer networks on simulated analog hardware with different interpolated activation functions. We find that in ReLU, errors in the gradient due to noise are amplified during backpropagation, leading to a significant reduction in model performance. However, we observe that error amplification decreases as we move toward GELU/SiLU, until it is non-existent at GELU/SiLU demonstrating that continuously differentiable activation functions are <inline-formula><tex-math>$\\sim 100\\times$</tex-math></inline-formula> more noise-resistant than conventional rectified activations for inputs near zero. Our findings provide guidance in selecting the appropriate activations to realize reliable and performant photonic and other analog hardware accelerators in several domains of machine learning, such as computer vision, signal processing, and beyond.","PeriodicalId":13094,"journal":{"name":"IEEE Journal of Selected Topics in Quantum Electronics","volume":"31 3: AI/ML Integrated Opto-electronics","pages":"1-9"},"PeriodicalIF":5.1000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Quantum Electronics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10854609/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Real-world analog systems, such as photonic neural networks, intrinsically suffer from noise that can impede model convergence and accuracy for a variety of deep learning models. In the presence of noise, some activation functions behave erratically or even amplify the noise. Specifically, ReLU, an activation function used ubiquitously in digital deep learning systems, not only poses a challenge to implement in analog hardware but has also been shown to perform worse than continuously differentiable activation functions. In this paper, we demonstrate that GELU and SiLU enable robust propagation of gradients in analog hardware because they are continuously differentiable functions. To analyze this cause of activation differences in the presence of noise, we used functional interpolation between ReLU and GELU/SiLU to perform analysis and training of convolutional, linear, and transformer networks on simulated analog hardware with different interpolated activation functions. We find that in ReLU, errors in the gradient due to noise are amplified during backpropagation, leading to a significant reduction in model performance. However, we observe that error amplification decreases as we move toward GELU/SiLU, until it is non-existent at GELU/SiLU demonstrating that continuously differentiable activation functions are

$\sim 100\times$

more noise-resistant than conventional rectified activations for inputs near zero. Our findings provide guidance in selecting the appropriate activations to realize reliable and performant photonic and other analog hardware accelerators in several domains of machine learning, such as computer vision, signal processing, and beyond.

查看原文本刊更多论文

利用连续可微激活在模拟和量化噪声环境中学习

现实世界的模拟系统，如光子神经网络，本质上受到噪声的影响，噪声会阻碍各种深度学习模型的收敛和准确性。在噪声存在的情况下，一些激活函数的行为不规律，甚至会放大噪声。具体来说，数字深度学习系统中普遍使用的激活函数ReLU不仅对模拟硬件的实现提出了挑战，而且还被证明比连续可微激活函数表现更差。在本文中，我们证明了GELU和SiLU能够在模拟硬件中实现梯度的鲁棒传播，因为它们是连续可微函数。为了分析噪声存在下激活差异的原因，我们使用ReLU和GELU/SiLU之间的函数插值，在具有不同插值激活函数的模拟模拟硬件上对卷积、线性和变压器网络进行分析和训练。我们发现，在ReLU中，由于噪声引起的梯度误差在反向传播过程中被放大，导致模型性能显著降低。然而，我们观察到误差放大随着我们向GELU/SiLU移动而减小，直到在GELU/SiLU不存在为止，这表明连续可微激活函数比传统整流激活在输入接近零时的抗噪声性高100倍。我们的研究结果为在机器学习的几个领域（如计算机视觉、信号处理等）选择合适的激活来实现可靠和高性能的光子和其他模拟硬件加速器提供了指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal of Selected Topics in Quantum Electronics 工程技术-工程：电子与电气

CiteScore

10.60

自引率

2.00%

发文量

212

审稿时长

3 months

期刊介绍： Papers published in the IEEE Journal of Selected Topics in Quantum Electronics fall within the broad field of science and technology of quantum electronics of a device, subsystem, or system-oriented nature. Each issue is devoted to a specific topic within this broad spectrum. Announcements of the topical areas planned for future issues, along with deadlines for receipt of manuscripts, are published in this Journal and in the IEEE Journal of Quantum Electronics. Generally, the scope of manuscripts appropriate to this Journal is the same as that for the IEEE Journal of Quantum Electronics. Manuscripts are published that report original theoretical and/or experimental research results that advance the scientific and technological base of quantum electronics devices, systems, or applications. The Journal is dedicated toward publishing research results that advance the state of the art or add to the understanding of the generation, amplification, modulation, detection, waveguiding, or propagation characteristics of coherent electromagnetic radiation having sub-millimeter and shorter wavelengths. In order to be suitable for publication in this Journal, the content of manuscripts concerned with subject-related research must have a potential impact on advancing the technological base of quantum electronic devices, systems, and/or applications. Potential authors of subject-related research have the responsibility of pointing out this potential impact. System-oriented manuscripts must be concerned with systems that perform a function previously unavailable or that outperform previously established systems that did not use quantum electronic components or concepts. Tutorial and review papers are by invitation only.