{"title":"Softmax-kernel reproduced gradient descent for stochastic optimization on streaming data","authors":"Yifu Lin , Wenling Li , Yang Liu , Jia Song","doi":"10.1016/j.sigpro.2025.109904","DOIUrl":null,"url":null,"abstract":"<div><div>Stochastic gradient descent (SGD) is commonly used for machine learning on streaming data. However, it suffers from slow convergence due to gradient variance. To address this issue, the Reproducing Kernel Hilbert Space (RKHS) theory is applied to build a kernel learning model and learn the gradient of the risk function. To avoid the inherent dimensional trap in kernel methods, a softmax kernel function is designed to reproduce the gradient iteratively, by which a novel algorithm called softmax-kernel reproduced gradient descent (SoKRGD) is further proposed. It is shown that SoKRGD achieves a faster convergence rate than SGD. Experimental results are provided to validate these findings by training ResNet50 and Vision Transformer (ViT). It is observed that using the reproduced gradient in place of the stochastic gradient can promote the performance of SGD-based optimizers.</div></div>","PeriodicalId":49523,"journal":{"name":"Signal Processing","volume":"231 ","pages":"Article 109904"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165168425000192","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Stochastic gradient descent (SGD) is commonly used for machine learning on streaming data. However, it suffers from slow convergence due to gradient variance. To address this issue, the Reproducing Kernel Hilbert Space (RKHS) theory is applied to build a kernel learning model and learn the gradient of the risk function. To avoid the inherent dimensional trap in kernel methods, a softmax kernel function is designed to reproduce the gradient iteratively, by which a novel algorithm called softmax-kernel reproduced gradient descent (SoKRGD) is further proposed. It is shown that SoKRGD achieves a faster convergence rate than SGD. Experimental results are provided to validate these findings by training ResNet50 and Vision Transformer (ViT). It is observed that using the reproduced gradient in place of the stochastic gradient can promote the performance of SGD-based optimizers.
期刊介绍:
Signal Processing incorporates all aspects of the theory and practice of signal processing. It features original research work, tutorial and review articles, and accounts of practical developments. It is intended for a rapid dissemination of knowledge and experience to engineers and scientists working in the research, development or practical application of signal processing.
Subject areas covered by the journal include: Signal Theory; Stochastic Processes; Detection and Estimation; Spectral Analysis; Filtering; Signal Processing Systems; Software Developments; Image Processing; Pattern Recognition; Optical Signal Processing; Digital Signal Processing; Multi-dimensional Signal Processing; Communication Signal Processing; Biomedical Signal Processing; Geophysical and Astrophysical Signal Processing; Earth Resources Signal Processing; Acoustic and Vibration Signal Processing; Data Processing; Remote Sensing; Signal Processing Technology; Radar Signal Processing; Sonar Signal Processing; Industrial Applications; New Applications.