动态偏好推理网络：通过偏好估计提高多目标强化学习的样本效率

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2024-09-11 DOI:10.1016/j.knosys.2024.112512

{"title":"动态偏好推理网络：通过偏好估计提高多目标强化学习的样本效率","authors":"","doi":"10.1016/j.knosys.2024.112512","DOIUrl":null,"url":null,"abstract":"<div><p>Multi-objective reinforcement learning (MORL) addresses the challenge of optimizing policies in environments with multiple conflicting objectives. Traditional approaches often rely on scalar utility functions, which require predefined preference weights, limiting their adaptability and efficiency. To overcome this, we propose the Dynamic Preference Inference Network (DPIN), a novel method designed to enhance sample efficiency by dynamically estimating the trajectory decision preference of the agent. DPIN leverages a neural network to predict the most favorable preference distribution for each trajectory, enabling more effective policy updates and improving overall performance in complex MORL tasks. Extensive experiments in various benchmark environments demonstrate that DPIN significantly outperforms existing state-of-the-art methods, achieving higher scalarized returns and hypervolume. Our findings highlight DPIN’s ability to adapt to varying preferences, reduce sample complexity, and provide robust solutions in multi-objective settings.</p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamic preference inference network: Improving sample efficiency for multi-objective reinforcement learning by preference estimation\",\"authors\":\"\",\"doi\":\"10.1016/j.knosys.2024.112512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Multi-objective reinforcement learning (MORL) addresses the challenge of optimizing policies in environments with multiple conflicting objectives. Traditional approaches often rely on scalar utility functions, which require predefined preference weights, limiting their adaptability and efficiency. To overcome this, we propose the Dynamic Preference Inference Network (DPIN), a novel method designed to enhance sample efficiency by dynamically estimating the trajectory decision preference of the agent. DPIN leverages a neural network to predict the most favorable preference distribution for each trajectory, enabling more effective policy updates and improving overall performance in complex MORL tasks. Extensive experiments in various benchmark environments demonstrate that DPIN significantly outperforms existing state-of-the-art methods, achieving higher scalarized returns and hypervolume. Our findings highlight DPIN’s ability to adapt to varying preferences, reduce sample complexity, and provide robust solutions in multi-objective settings.</p></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705124011468\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124011468","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多目标强化学习（MORL）解决了在多个目标相互冲突的环境中优化策略的难题。传统方法通常依赖于标量效用函数，而标量效用函数需要预定义的偏好权重，这限制了它们的适应性和效率。为了克服这一问题，我们提出了动态偏好推理网络（DPIN），这是一种新颖的方法，旨在通过动态估计代理的轨迹决策偏好来提高采样效率。DPIN 利用神经网络预测每条轨迹最有利的偏好分布，从而实现更有效的策略更新，提高复杂 MORL 任务的整体性能。在各种基准环境中进行的广泛实验表明，DPIN 的性能明显优于现有的最先进方法，实现了更高的标量收益和超体积。我们的研究结果凸显了 DPIN 在多目标设置中适应不同偏好、降低样本复杂度和提供稳健解决方案的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic preference inference network: Improving sample efficiency for multi-objective reinforcement learning by preference estimation

Multi-objective reinforcement learning (MORL) addresses the challenge of optimizing policies in environments with multiple conflicting objectives. Traditional approaches often rely on scalar utility functions, which require predefined preference weights, limiting their adaptability and efficiency. To overcome this, we propose the Dynamic Preference Inference Network (DPIN), a novel method designed to enhance sample efficiency by dynamically estimating the trajectory decision preference of the agent. DPIN leverages a neural network to predict the most favorable preference distribution for each trajectory, enabling more effective policy updates and improving overall performance in complex MORL tasks. Extensive experiments in various benchmark environments demonstrate that DPIN significantly outperforms existing state-of-the-art methods, achieving higher scalarized returns and hypervolume. Our findings highlight DPIN’s ability to adapt to varying preferences, reduce sample complexity, and provide robust solutions in multi-objective settings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.