{"title":"大规模学习系统有偏随机梯度估计的强力球方法","authors":"Zhuang Yang","doi":"10.1109/TCSS.2024.3411630","DOIUrl":null,"url":null,"abstract":"The Powerball method, via incorporating a power coefficient into conventional optimization algorithms, has been considered in accelerating stochastic optimization (SO) algorithms in recent years, giving rise to a series of powered stochastic optimization (PSO) algorithms. Although the Powerball technique is orthogonal to the existing accelerated techniques (e.g., the learning rate adjustment strategy) for SO algorithms, the current PSO algorithms take a nearly similar algorithm framework to SO algorithms, where the direct negative result for PSO algorithms is making them inherit low-convergence rate and unstable performance from SO for practical problems. Inspired by this gap, this work develops a novel class of PSO algorithms from the perspective of biased stochastic gradient estimation (BSGE). Specifically, we first explore the theoretical property and the empirical characteristic of vanilla-powered stochastic gradient descent (P-SGD) with BSGE. Second, to further demonstrate the positive impact of BSGE in enhancing the P-SGD type algorithm, we investigate the feature of theory and experiment of P-SGD with momentum under BSGE, where we particularly focus on the effect of negative momentum in P-SGD that is less studied in PSO. Particularly, we prove that the overall complexity of the resulting algorithms matches that of advanced SO algorithms. Finally, large numbers of numerical experiments on benchmark datasets confirm the successful reformation of BSGE in perfecting PSO. This work provides comprehension of the role of BSGE in PSO algorithms, extending the family of PSO algorithms.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"11 6","pages":"7435-7447"},"PeriodicalIF":4.5000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems\",\"authors\":\"Zhuang Yang\",\"doi\":\"10.1109/TCSS.2024.3411630\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Powerball method, via incorporating a power coefficient into conventional optimization algorithms, has been considered in accelerating stochastic optimization (SO) algorithms in recent years, giving rise to a series of powered stochastic optimization (PSO) algorithms. Although the Powerball technique is orthogonal to the existing accelerated techniques (e.g., the learning rate adjustment strategy) for SO algorithms, the current PSO algorithms take a nearly similar algorithm framework to SO algorithms, where the direct negative result for PSO algorithms is making them inherit low-convergence rate and unstable performance from SO for practical problems. Inspired by this gap, this work develops a novel class of PSO algorithms from the perspective of biased stochastic gradient estimation (BSGE). Specifically, we first explore the theoretical property and the empirical characteristic of vanilla-powered stochastic gradient descent (P-SGD) with BSGE. Second, to further demonstrate the positive impact of BSGE in enhancing the P-SGD type algorithm, we investigate the feature of theory and experiment of P-SGD with momentum under BSGE, where we particularly focus on the effect of negative momentum in P-SGD that is less studied in PSO. Particularly, we prove that the overall complexity of the resulting algorithms matches that of advanced SO algorithms. Finally, large numbers of numerical experiments on benchmark datasets confirm the successful reformation of BSGE in perfecting PSO. This work provides comprehension of the role of BSGE in PSO algorithms, extending the family of PSO algorithms.\",\"PeriodicalId\":13044,\"journal\":{\"name\":\"IEEE Transactions on Computational Social Systems\",\"volume\":\"11 6\",\"pages\":\"7435-7447\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Social Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10581404/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10581404/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
The Powerball Method With Biased Stochastic Gradient Estimation for Large-Scale Learning Systems
The Powerball method, via incorporating a power coefficient into conventional optimization algorithms, has been considered in accelerating stochastic optimization (SO) algorithms in recent years, giving rise to a series of powered stochastic optimization (PSO) algorithms. Although the Powerball technique is orthogonal to the existing accelerated techniques (e.g., the learning rate adjustment strategy) for SO algorithms, the current PSO algorithms take a nearly similar algorithm framework to SO algorithms, where the direct negative result for PSO algorithms is making them inherit low-convergence rate and unstable performance from SO for practical problems. Inspired by this gap, this work develops a novel class of PSO algorithms from the perspective of biased stochastic gradient estimation (BSGE). Specifically, we first explore the theoretical property and the empirical characteristic of vanilla-powered stochastic gradient descent (P-SGD) with BSGE. Second, to further demonstrate the positive impact of BSGE in enhancing the P-SGD type algorithm, we investigate the feature of theory and experiment of P-SGD with momentum under BSGE, where we particularly focus on the effect of negative momentum in P-SGD that is less studied in PSO. Particularly, we prove that the overall complexity of the resulting algorithms matches that of advanced SO algorithms. Finally, large numbers of numerical experiments on benchmark datasets confirm the successful reformation of BSGE in perfecting PSO. This work provides comprehension of the role of BSGE in PSO algorithms, extending the family of PSO algorithms.
期刊介绍:
IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.