{"title":"Multi-objective path integral policy improvement for learning robotic motion","authors":"Hayato Sago, Ryo Ariizumi, Toru Asai, Shun-ichi Azuma","doi":"10.1007/s10015-025-01027-z","DOIUrl":null,"url":null,"abstract":"<div><p>This paper proposes a new multi-objective reinforcement learning (MORL) algorithm for robotics by extending policy improvement with path integral (<span>\\(\\text {PI}^2\\)</span>) algorithm. For a robot motion acquisition problem, most existing MORL algorithms are hard to apply, because of the high-dimensional and continuous state and action spaces. However, policy-based algorithms such as <span>\\(\\text {PI}^2\\)</span> can be applied to solve this problem in single-objective cases. Based on the similarity of <span>\\(\\text {PI}^2\\)</span> and evolution strategies (ESs) and the fact that ESs are well-suited for multi-objective optimization, we propose an extension of <span>\\(\\text {PI}^2\\)</span> and some techniques to speed up the learning. The effectiveness is shown via numerical simulations.</p></div>","PeriodicalId":46050,"journal":{"name":"Artificial Life and Robotics","volume":"30 3","pages":"534 - 545"},"PeriodicalIF":0.8000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10015-025-01027-z.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Life and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s10015-025-01027-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes a new multi-objective reinforcement learning (MORL) algorithm for robotics by extending policy improvement with path integral (\(\text {PI}^2\)) algorithm. For a robot motion acquisition problem, most existing MORL algorithms are hard to apply, because of the high-dimensional and continuous state and action spaces. However, policy-based algorithms such as \(\text {PI}^2\) can be applied to solve this problem in single-objective cases. Based on the similarity of \(\text {PI}^2\) and evolution strategies (ESs) and the fact that ESs are well-suited for multi-objective optimization, we propose an extension of \(\text {PI}^2\) and some techniques to speed up the learning. The effectiveness is shown via numerical simulations.