A multi-objective goal-oriented reinforcement learning algorithm for dynamic multi-objective sequential decision making

IF 2.6 3区计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS

Autonomous Agents and Multi-Agent Systems Pub Date : 2026-02-07 DOI:10.1007/s10458-026-09735-x

Haofang Yu, Hong-chuan Yang, Yanyan Huang

{"title":"A multi-objective goal-oriented reinforcement learning algorithm for dynamic multi-objective sequential decision making","authors":"Haofang Yu, Hong-chuan Yang, Yanyan Huang","doi":"10.1007/s10458-026-09735-x","DOIUrl":null,"url":null,"abstract":"<div><p>Multi-objective reinforcement learning (MORL) algorithms predominantly rely on scalarization functions parameterized with the preferences of the decision maker to derive trade-off solutions. However, this is not always feasible or desirable in the deterministic settings where scalarization functions are hard to specify, or where Pareto optimal solutions vary solely due to changes in the multi-objective reward function. Therefore, we consider a goal-augmented dynamic multi-objective Markov decision process (GA-DMOMDP), which enables the learning of Pareto optimal solutions through specifying and pursuing appropriate goals rather than relying on explicit scalarization functions. Restricted to the above GA-DMOMDPs, a multi-objective goal-oriented reinforcement learning (MOGORL) algorithm is further proposed so that the possibly changing Pareto optimal solutions can be tracked. In our algorithm, an on-line learning mode is proposed to continuously detect new goals, and to simultaneously pursue different goals by a hindsight relabeling strategy. Experimental results show that our algorithm can learn the Pareto optimal solutions in the deterministic environments with either static or dynamically changing rewards, regardless of the shape of Pareto optimal fronts, which outperforms generalized MORL algorithms with linear and Chebyshev scalarization functions.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Agents and Multi-Agent Systems","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10458-026-09735-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-objective reinforcement learning (MORL) algorithms predominantly rely on scalarization functions parameterized with the preferences of the decision maker to derive trade-off solutions. However, this is not always feasible or desirable in the deterministic settings where scalarization functions are hard to specify, or where Pareto optimal solutions vary solely due to changes in the multi-objective reward function. Therefore, we consider a goal-augmented dynamic multi-objective Markov decision process (GA-DMOMDP), which enables the learning of Pareto optimal solutions through specifying and pursuing appropriate goals rather than relying on explicit scalarization functions. Restricted to the above GA-DMOMDPs, a multi-objective goal-oriented reinforcement learning (MOGORL) algorithm is further proposed so that the possibly changing Pareto optimal solutions can be tracked. In our algorithm, an on-line learning mode is proposed to continuously detect new goals, and to simultaneously pursue different goals by a hindsight relabeling strategy. Experimental results show that our algorithm can learn the Pareto optimal solutions in the deterministic environments with either static or dynamically changing rewards, regardless of the shape of Pareto optimal fronts, which outperforms generalized MORL algorithms with linear and Chebyshev scalarization functions.

Abstract Image

查看原文本刊更多论文

动态多目标序列决策的多目标导向强化学习算法

多目标强化学习（MORL）算法主要依靠与决策者偏好参数化的标量函数来推导权衡解。然而，这并不总是可行的或理想的在确定性的设置中，其中标量函数难以指定，或帕累托最优解仅因多目标奖励函数的变化而变化。因此，我们考虑了一个目标增强的动态多目标马尔可夫决策过程（GA-DMOMDP），它通过指定和追求适当的目标而不是依赖于显式的标量函数来实现Pareto最优解的学习。针对上述ga - dmomdp，进一步提出了一种多目标导向强化学习（MOGORL）算法，以跟踪可能变化的Pareto最优解。在我们的算法中，我们提出了一种在线学习模式来不断地发现新的目标，并通过后见之明的重新标记策略来同时追求不同的目标。实验结果表明，该算法可以在静态或动态变化奖励的确定性环境中学习Pareto最优解，而不考虑Pareto最优前沿的形状，优于具有线性和Chebyshev标量化函数的广义MORL算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Autonomous Agents and Multi-Agent Systems 工程技术-计算机：人工智能

CiteScore

6.00

自引率

5.30%

发文量

审稿时长

>12 weeks

期刊介绍： This is the official journal of the International Foundation for Autonomous Agents and Multi-Agent Systems. It provides a leading forum for disseminating significant original research results in the foundations, theory, development, analysis, and applications of autonomous agents and multi-agent systems. Coverage in Autonomous Agents and Multi-Agent Systems includes, but is not limited to: Agent decision-making architectures and their evaluation, including: cognitive models; knowledge representation; logics for agency; ontological reasoning; planning (single and multi-agent); reasoning (single and multi-agent) Cooperation and teamwork, including: distributed problem solving; human-robot/agent interaction; multi-user/multi-virtual-agent interaction; coalition formation; coordination Agent communication languages, including: their semantics, pragmatics, and implementation; agent communication protocols and conversations; agent commitments; speech act theory Ontologies for agent systems, agents and the semantic web, agents and semantic web services, Grid-based systems, and service-oriented computing Agent societies and societal issues, including: artificial social systems; environments, organizations and institutions; ethical and legal issues; privacy, safety and security; trust, reliability and reputation Agent-based system development, including: agent development techniques, tools and environments; agent programming languages; agent specification or validation languages Agent-based simulation, including: emergent behavior; participatory simulation; simulation techniques, tools and environments; social simulation Agreement technologies, including: argumentation; collective decision making; judgment aggregation and belief merging; negotiation; norms Economic paradigms, including: auction and mechanism design; bargaining and negotiation; economically-motivated agents; game theory (cooperative and non-cooperative); social choice and voting Learning agents, including: computational architectures for learning agents; evolution, adaptation; multi-agent learning. Robotic agents, including: integrated perception, cognition, and action; cognitive robotics; robot planning (including action and motion planning); multi-robot systems. Virtual agents, including: agents in games and virtual environments; companion and coaching agents; modeling personality, emotions; multimodal interaction; verbal and non-verbal expressiveness Significant, novel applications of agent technology Comprehensive reviews and authoritative tutorials of research and practice in agent systems Comprehensive and authoritative reviews of books dealing with agents and multi-agent systems.