Jie Wang;Mingxuan Ye;Yufei Kuang;Rui Yang;Wengang Zhou;Houqiang Li;Feng Wu
{"title":"Long-Term Feature Extraction via Frequency Prediction for Efficient Reinforcement Learning","authors":"Jie Wang;Mingxuan Ye;Yufei Kuang;Rui Yang;Wengang Zhou;Houqiang Li;Feng Wu","doi":"10.1109/TPAMI.2025.3529264","DOIUrl":null,"url":null,"abstract":"Sample efficiency remains a key challenge for the deployment of deep reinforcement learning (RL) in real-world scenarios. A common approach is to learn efficient representations through future prediction tasks, facilitating the agent to make farsighted decisions that benefit its long-term performance. Existing methods extract predictive features by predicting multi-step future state signals. However, they do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we introduce a new perspective that leverages the frequency domain of state sequences to extract the underlying patterns in time series data. We theoretically show that state sequences contain structural information closely tied to policy performance and signal regularity and analyze the fitness of the frequency domain for extracting these two types of structural information. Inspired by that, we propose a novel representation learning method, <bold>S</b>tate Sequences <bold>P</b>rediction via <bold>F</b>ourier Transform (SPF), which extracts long-term features by predicting the Fourier transform of infinite-step future state sequences. The appealing features of our frequency prediction objective include: 1) simple to implement due to a recursive relationship; 2) providing an upper bound on the performance difference between the optimal policy and the latent policy in the representation space. Experiments on standard and goal-conditioned RL tasks demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"3094-3110"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10839463/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sample efficiency remains a key challenge for the deployment of deep reinforcement learning (RL) in real-world scenarios. A common approach is to learn efficient representations through future prediction tasks, facilitating the agent to make farsighted decisions that benefit its long-term performance. Existing methods extract predictive features by predicting multi-step future state signals. However, they do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we introduce a new perspective that leverages the frequency domain of state sequences to extract the underlying patterns in time series data. We theoretically show that state sequences contain structural information closely tied to policy performance and signal regularity and analyze the fitness of the frequency domain for extracting these two types of structural information. Inspired by that, we propose a novel representation learning method, State Sequences Prediction via Fourier Transform (SPF), which extracts long-term features by predicting the Fourier transform of infinite-step future state sequences. The appealing features of our frequency prediction objective include: 1) simple to implement due to a recursive relationship; 2) providing an upper bound on the performance difference between the optimal policy and the latent policy in the representation space. Experiments on standard and goal-conditioned RL tasks demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.