{"title":"MENTOR: Guiding Hierarchical Reinforcement Learning With Human Feedback and Dynamic Distance Constraint","authors":"Xinglin Zhou;Yifu Yuan;Shaofu Yang;Jianye Hao","doi":"10.1109/TETCI.2025.3529902","DOIUrl":null,"url":null,"abstract":"Hierarchical reinforcement learning (HRL) provides a promising solution for complex tasks with sparse rewards of agents, which uses a hierarchical framework that divides tasks into subgoals and completes them sequentially. However, current methods struggle to find suitable subgoals for ensuring a stable learning process. To address the issue, we propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints, termed <bold>MENTOR</b>, which acts as a “<italic>mentor</i>”. Specifically, human feedback is incorporated into high-level policy learning to find better subgoals. Furthermore, we propose the Dynamic Distance Constraint (DDC) mechanism dynamically adjusting the space of optional subgoals, such that MENTOR can generate subgoals matching the low-level policy learning process from easy to hard. As a result, the learning efficiency can be improved. As for low-level policy, a dual policy is designed for exploration-exploitation decoupling to stabilize the training process. Extensive experiments demonstrate that MENTOR uses a small amount of human feedback to achieve significant improvement in complex tasks with sparse rewards.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1292-1306"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10856553/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Hierarchical reinforcement learning (HRL) provides a promising solution for complex tasks with sparse rewards of agents, which uses a hierarchical framework that divides tasks into subgoals and completes them sequentially. However, current methods struggle to find suitable subgoals for ensuring a stable learning process. To address the issue, we propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints, termed MENTOR, which acts as a “mentor”. Specifically, human feedback is incorporated into high-level policy learning to find better subgoals. Furthermore, we propose the Dynamic Distance Constraint (DDC) mechanism dynamically adjusting the space of optional subgoals, such that MENTOR can generate subgoals matching the low-level policy learning process from easy to hard. As a result, the learning efficiency can be improved. As for low-level policy, a dual policy is designed for exploration-exploitation decoupling to stabilize the training process. Extensive experiments demonstrate that MENTOR uses a small amount of human feedback to achieve significant improvement in complex tasks with sparse rewards.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.