{"title":"Leveraging Privileged Information for Partially Observable Reinforcement Learning","authors":"Jinqiu Li;Enmin Zhao;Tong Wei;Junliang Xing;Shiming Xiang","doi":"10.1109/TG.2025.3542158","DOIUrl":null,"url":null,"abstract":"Reinforcement learning has achieved remarkable success across diverse scenarios. However, learning optimal policies within partially observable games remains a formidable challenge. Crucial privileged information in states is often shrouded during gameplay, yet ideally, it should be accessible and exploitable during training. Previous studies have concentrated on formulating policies based wholly on partial observations or oracle states. Nevertheless, these approaches often face hindrances in attaining effective generalization. To surmount this challenge, we propose the actor–cross-critic (ACC) learning framework, integrating both partial observations and oracle states. ACC achieves this by coordinating two critics and invoking a maximization operation mechanism to switch between them dynamically. This approach encourages the selection of the higher values when computing advantages within the actor–critic framework, thereby accelerating learning and mitigating bias under partial observability. Some theoretical analyses show that ACC exhibits better learning ability toward optimal policies than actor–critic learning using the oracle states. We highlight its superior performance through comprehensive evaluations in decision-making tasks, such as <italic>QuestBall</i>, <italic>Minigrid</i>, and <italic>Atari</i>, and the challenging card game <italic>DouDizhu</i>.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 3","pages":"765-776"},"PeriodicalIF":2.8000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10887124/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Reinforcement learning has achieved remarkable success across diverse scenarios. However, learning optimal policies within partially observable games remains a formidable challenge. Crucial privileged information in states is often shrouded during gameplay, yet ideally, it should be accessible and exploitable during training. Previous studies have concentrated on formulating policies based wholly on partial observations or oracle states. Nevertheless, these approaches often face hindrances in attaining effective generalization. To surmount this challenge, we propose the actor–cross-critic (ACC) learning framework, integrating both partial observations and oracle states. ACC achieves this by coordinating two critics and invoking a maximization operation mechanism to switch between them dynamically. This approach encourages the selection of the higher values when computing advantages within the actor–critic framework, thereby accelerating learning and mitigating bias under partial observability. Some theoretical analyses show that ACC exhibits better learning ability toward optimal policies than actor–critic learning using the oracle states. We highlight its superior performance through comprehensive evaluations in decision-making tasks, such as QuestBall, Minigrid, and Atari, and the challenging card game DouDizhu.