{"title":"Leveraging Continuously Differentiable Activation for Learning in Analog and Quantized Noisy Environments","authors":"Vivswan Shah;Nathan Youngblood","doi":"10.1109/JSTQE.2025.3534636","DOIUrl":null,"url":null,"abstract":"Real-world analog systems, such as photonic neural networks, intrinsically suffer from noise that can impede model convergence and accuracy for a variety of deep learning models. In the presence of noise, some activation functions behave erratically or even amplify the noise. Specifically, ReLU, an activation function used ubiquitously in digital deep learning systems, not only poses a challenge to implement in analog hardware but has also been shown to perform worse than continuously differentiable activation functions. In this paper, we demonstrate that GELU and SiLU enable robust propagation of gradients in analog hardware because they are continuously differentiable functions. To analyze this cause of activation differences in the presence of noise, we used functional interpolation between ReLU and GELU/SiLU to perform analysis and training of convolutional, linear, and transformer networks on simulated analog hardware with different interpolated activation functions. We find that in ReLU, errors in the gradient due to noise are amplified during backpropagation, leading to a significant reduction in model performance. However, we observe that error amplification decreases as we move toward GELU/SiLU, until it is non-existent at GELU/SiLU demonstrating that continuously differentiable activation functions are <inline-formula><tex-math>$\\sim 100\\times$</tex-math></inline-formula> more noise-resistant than conventional rectified activations for inputs near zero. Our findings provide guidance in selecting the appropriate activations to realize reliable and performant photonic and other analog hardware accelerators in several domains of machine learning, such as computer vision, signal processing, and beyond.","PeriodicalId":13094,"journal":{"name":"IEEE Journal of Selected Topics in Quantum Electronics","volume":"31 3: AI/ML Integrated Opto-electronics","pages":"1-9"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Quantum Electronics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10854609/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Real-world analog systems, such as photonic neural networks, intrinsically suffer from noise that can impede model convergence and accuracy for a variety of deep learning models. In the presence of noise, some activation functions behave erratically or even amplify the noise. Specifically, ReLU, an activation function used ubiquitously in digital deep learning systems, not only poses a challenge to implement in analog hardware but has also been shown to perform worse than continuously differentiable activation functions. In this paper, we demonstrate that GELU and SiLU enable robust propagation of gradients in analog hardware because they are continuously differentiable functions. To analyze this cause of activation differences in the presence of noise, we used functional interpolation between ReLU and GELU/SiLU to perform analysis and training of convolutional, linear, and transformer networks on simulated analog hardware with different interpolated activation functions. We find that in ReLU, errors in the gradient due to noise are amplified during backpropagation, leading to a significant reduction in model performance. However, we observe that error amplification decreases as we move toward GELU/SiLU, until it is non-existent at GELU/SiLU demonstrating that continuously differentiable activation functions are $\sim 100\times$ more noise-resistant than conventional rectified activations for inputs near zero. Our findings provide guidance in selecting the appropriate activations to realize reliable and performant photonic and other analog hardware accelerators in several domains of machine learning, such as computer vision, signal processing, and beyond.
期刊介绍:
Papers published in the IEEE Journal of Selected Topics in Quantum Electronics fall within the broad field of science and technology of quantum electronics of a device, subsystem, or system-oriented nature. Each issue is devoted to a specific topic within this broad spectrum. Announcements of the topical areas planned for future issues, along with deadlines for receipt of manuscripts, are published in this Journal and in the IEEE Journal of Quantum Electronics. Generally, the scope of manuscripts appropriate to this Journal is the same as that for the IEEE Journal of Quantum Electronics. Manuscripts are published that report original theoretical and/or experimental research results that advance the scientific and technological base of quantum electronics devices, systems, or applications. The Journal is dedicated toward publishing research results that advance the state of the art or add to the understanding of the generation, amplification, modulation, detection, waveguiding, or propagation characteristics of coherent electromagnetic radiation having sub-millimeter and shorter wavelengths. In order to be suitable for publication in this Journal, the content of manuscripts concerned with subject-related research must have a potential impact on advancing the technological base of quantum electronic devices, systems, and/or applications. Potential authors of subject-related research have the responsibility of pointing out this potential impact. System-oriented manuscripts must be concerned with systems that perform a function previously unavailable or that outperform previously established systems that did not use quantum electronic components or concepts. Tutorial and review papers are by invitation only.