{"title":"用于离线强化学习的聚合掩码自动编码","authors":"Changqing Yuan , Yongfang Xie , Shiwen Xie , Zhaohui Tang , Zongze Wu","doi":"10.1016/j.patrec.2025.08.007","DOIUrl":null,"url":null,"abstract":"<div><div>Viewing offline reinforcement learning (RL) as a sequence modeling problem has emerged as a new research trend. Recent approaches leverage self-supervised learning to improve sequence representations, yet most rely on state sequences for pretraining, thereby disrupting the intrinsic state–action coupling, which complicates the distinction of trajectory bifurcations caused by action quality differences. Moreover, actions from stochastic policies in offline datasets may cause low-quality state transitions to be mistakenly identified as salient information, hindering representation learning and degrading policy performance. To mitigate these issues, we propose aggregated masked future prediction (AMFP), a self-supervised learning framework for offline RL. AMFP introduces a new pretext task that combines weighted aggregation and masked autoencoding through global fusion tokens to perform aggregated masked reconstruction. The weighted aggregation mechanism is to assign higher weights to samples that are semantically similar to the anchor in the representation space, enabling the model to emphasize reliable state transitions and suppress misleading transitions from stochastic or low-quality actions. Meanwhile, the global fusion tokens serve a dual purpose: they facilitate the integration of weighted aggregation and masked autoencoding, and, after encoding, function as compressed representations of the state trajectory and implicit action-state coupling. The encoded representations are then utilized as the latent contextual factor to guide policy learning and improve robustness. Experimental evaluation on D4RL benchmarks demonstrates the effectiveness of our method in improving policy learning.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 312-318"},"PeriodicalIF":3.3000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Aggregated masked autoencoding for offline reinforcement learning\",\"authors\":\"Changqing Yuan , Yongfang Xie , Shiwen Xie , Zhaohui Tang , Zongze Wu\",\"doi\":\"10.1016/j.patrec.2025.08.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Viewing offline reinforcement learning (RL) as a sequence modeling problem has emerged as a new research trend. Recent approaches leverage self-supervised learning to improve sequence representations, yet most rely on state sequences for pretraining, thereby disrupting the intrinsic state–action coupling, which complicates the distinction of trajectory bifurcations caused by action quality differences. Moreover, actions from stochastic policies in offline datasets may cause low-quality state transitions to be mistakenly identified as salient information, hindering representation learning and degrading policy performance. To mitigate these issues, we propose aggregated masked future prediction (AMFP), a self-supervised learning framework for offline RL. AMFP introduces a new pretext task that combines weighted aggregation and masked autoencoding through global fusion tokens to perform aggregated masked reconstruction. The weighted aggregation mechanism is to assign higher weights to samples that are semantically similar to the anchor in the representation space, enabling the model to emphasize reliable state transitions and suppress misleading transitions from stochastic or low-quality actions. Meanwhile, the global fusion tokens serve a dual purpose: they facilitate the integration of weighted aggregation and masked autoencoding, and, after encoding, function as compressed representations of the state trajectory and implicit action-state coupling. The encoded representations are then utilized as the latent contextual factor to guide policy learning and improve robustness. Experimental evaluation on D4RL benchmarks demonstrates the effectiveness of our method in improving policy learning.</div></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"197 \",\"pages\":\"Pages 312-318\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865525002867\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002867","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Aggregated masked autoencoding for offline reinforcement learning
Viewing offline reinforcement learning (RL) as a sequence modeling problem has emerged as a new research trend. Recent approaches leverage self-supervised learning to improve sequence representations, yet most rely on state sequences for pretraining, thereby disrupting the intrinsic state–action coupling, which complicates the distinction of trajectory bifurcations caused by action quality differences. Moreover, actions from stochastic policies in offline datasets may cause low-quality state transitions to be mistakenly identified as salient information, hindering representation learning and degrading policy performance. To mitigate these issues, we propose aggregated masked future prediction (AMFP), a self-supervised learning framework for offline RL. AMFP introduces a new pretext task that combines weighted aggregation and masked autoencoding through global fusion tokens to perform aggregated masked reconstruction. The weighted aggregation mechanism is to assign higher weights to samples that are semantically similar to the anchor in the representation space, enabling the model to emphasize reliable state transitions and suppress misleading transitions from stochastic or low-quality actions. Meanwhile, the global fusion tokens serve a dual purpose: they facilitate the integration of weighted aggregation and masked autoencoding, and, after encoding, function as compressed representations of the state trajectory and implicit action-state coupling. The encoded representations are then utilized as the latent contextual factor to guide policy learning and improve robustness. Experimental evaluation on D4RL benchmarks demonstrates the effectiveness of our method in improving policy learning.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.