{"title":"机器人运动学习的多目标路径积分策略改进","authors":"Hayato Sago, Ryo Ariizumi, Toru Asai, Shun-ichi Azuma","doi":"10.1007/s10015-025-01027-z","DOIUrl":null,"url":null,"abstract":"<div><p>This paper proposes a new multi-objective reinforcement learning (MORL) algorithm for robotics by extending policy improvement with path integral (<span>\\(\\text {PI}^2\\)</span>) algorithm. For a robot motion acquisition problem, most existing MORL algorithms are hard to apply, because of the high-dimensional and continuous state and action spaces. However, policy-based algorithms such as <span>\\(\\text {PI}^2\\)</span> can be applied to solve this problem in single-objective cases. Based on the similarity of <span>\\(\\text {PI}^2\\)</span> and evolution strategies (ESs) and the fact that ESs are well-suited for multi-objective optimization, we propose an extension of <span>\\(\\text {PI}^2\\)</span> and some techniques to speed up the learning. The effectiveness is shown via numerical simulations.</p></div>","PeriodicalId":46050,"journal":{"name":"Artificial Life and Robotics","volume":"30 3","pages":"534 - 545"},"PeriodicalIF":0.8000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10015-025-01027-z.pdf","citationCount":"0","resultStr":"{\"title\":\"Multi-objective path integral policy improvement for learning robotic motion\",\"authors\":\"Hayato Sago, Ryo Ariizumi, Toru Asai, Shun-ichi Azuma\",\"doi\":\"10.1007/s10015-025-01027-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper proposes a new multi-objective reinforcement learning (MORL) algorithm for robotics by extending policy improvement with path integral (<span>\\\\(\\\\text {PI}^2\\\\)</span>) algorithm. For a robot motion acquisition problem, most existing MORL algorithms are hard to apply, because of the high-dimensional and continuous state and action spaces. However, policy-based algorithms such as <span>\\\\(\\\\text {PI}^2\\\\)</span> can be applied to solve this problem in single-objective cases. Based on the similarity of <span>\\\\(\\\\text {PI}^2\\\\)</span> and evolution strategies (ESs) and the fact that ESs are well-suited for multi-objective optimization, we propose an extension of <span>\\\\(\\\\text {PI}^2\\\\)</span> and some techniques to speed up the learning. The effectiveness is shown via numerical simulations.</p></div>\",\"PeriodicalId\":46050,\"journal\":{\"name\":\"Artificial Life and Robotics\",\"volume\":\"30 3\",\"pages\":\"534 - 545\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2025-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10015-025-01027-z.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Life and Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10015-025-01027-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Life and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s10015-025-01027-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ROBOTICS","Score":null,"Total":0}
Multi-objective path integral policy improvement for learning robotic motion
This paper proposes a new multi-objective reinforcement learning (MORL) algorithm for robotics by extending policy improvement with path integral (\(\text {PI}^2\)) algorithm. For a robot motion acquisition problem, most existing MORL algorithms are hard to apply, because of the high-dimensional and continuous state and action spaces. However, policy-based algorithms such as \(\text {PI}^2\) can be applied to solve this problem in single-objective cases. Based on the similarity of \(\text {PI}^2\) and evolution strategies (ESs) and the fact that ESs are well-suited for multi-objective optimization, we propose an extension of \(\text {PI}^2\) and some techniques to speed up the learning. The effectiveness is shown via numerical simulations.