{"title":"Quadruped robot locomotion via soft actor-critic with muti-head critic and dynamic policy gradient","authors":"Yanan Fan, Zhongcai Pei, Hongbing Shi, Meng Li, Tianyuan Guo, Zhiyong Tang","doi":"10.1007/s10489-025-06584-1","DOIUrl":null,"url":null,"abstract":"<div><p>Quadruped robots’ nonlinear complexity makes traditional modeling challenging, while deep reinforcement learning (DRL) learns effectively through direct environmental interaction without explicit kinematic and dynamic models, becoming an efficient approach for quadruped locomotion across diverse terrains. Conventional reinforcement learning methods typically combine multiple reward criteria into a single scalar function, limiting information representation and complicating the balance between multiple control objectives. We propose a novel multi-head critic and dynamic policy gradient SAC (MHD-SAC) algorithm, innovatively combining a multi-head critic architecture that independently evaluates distinct reward components and a dynamic policy gradient method that adaptively adjusts weights based on current performance. Through simulations on both flat and uneven terrains comparing three approaches (Soft Actor-Critic (SAC), multi-head critic SAC (MH-SAC), and MHD-SAC), we demonstrate that the MHD-SAC algorithm achieves significantly faster learning convergence and higher cumulative rewards than conventional methods. Performance analysis across different reward components reveals MHD-SAC’s superior ability to balance multiple objectives. The results validate that our approach effectively addresses the challenges of multi-objective optimization in quadruped locomotion control, providing a promising foundation for developing more versatile and robust legged robots capable of traversing complex environments.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 10","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06584-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Quadruped robots’ nonlinear complexity makes traditional modeling challenging, while deep reinforcement learning (DRL) learns effectively through direct environmental interaction without explicit kinematic and dynamic models, becoming an efficient approach for quadruped locomotion across diverse terrains. Conventional reinforcement learning methods typically combine multiple reward criteria into a single scalar function, limiting information representation and complicating the balance between multiple control objectives. We propose a novel multi-head critic and dynamic policy gradient SAC (MHD-SAC) algorithm, innovatively combining a multi-head critic architecture that independently evaluates distinct reward components and a dynamic policy gradient method that adaptively adjusts weights based on current performance. Through simulations on both flat and uneven terrains comparing three approaches (Soft Actor-Critic (SAC), multi-head critic SAC (MH-SAC), and MHD-SAC), we demonstrate that the MHD-SAC algorithm achieves significantly faster learning convergence and higher cumulative rewards than conventional methods. Performance analysis across different reward components reveals MHD-SAC’s superior ability to balance multiple objectives. The results validate that our approach effectively addresses the challenges of multi-objective optimization in quadruped locomotion control, providing a promising foundation for developing more versatile and robust legged robots capable of traversing complex environments.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.