Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering

IF 4.6 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Signal Processing Pub Date : 2024-11-27 DOI:10.1109/TSP.2024.3505266

Yuki Akiyama;Minh Vu;Konstantinos Slavakis

{"title":"Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering","authors":"Yuki Akiyama;Minh Vu;Konstantinos Slavakis","doi":"10.1109/TSP.2024.3505266","DOIUrl":null,"url":null,"abstract":"This paper designs novel nonparametric Bellman mappings in reproducing kernel Hilbert spaces (RKHSs) for reinforcement learning (RL). The proposed mappings benefit from the rich approximating properties of RKHSs, adopt no assumptions on the statistics of the data owing to their nonparametric nature, require no knowledge on transition probabilities of Markov decision processes, and may operate without any training data. Moreover, they allow for sampling on-the-fly via the design of trajectory samples, re-use past test data via experience replay, effect dimensionality reduction by random Fourier features, and enable computationally lightweight operations to fit into efficient online or time-adaptive learning. The paper offers also a variational framework to design the free parameters of the proposed Bellman mappings, and shows that appropriate choices of those parameters yield several popular Bellman-mapping designs. As an application, the proposed mappings are employed to offer a novel solution to the problem of countering outliers in adaptive filtering. More specifically, with no prior information on the statistics of the outliers and no training data, a policy-iteration algorithm is introduced to select online, per time instance, the “optimal” coefficient \n<inline-formula><tex-math>$p$</tex-math></inline-formula>\n in the least-mean-\n<inline-formula><tex-math>$p$</tex-math></inline-formula>\n-power-error method. Numerical tests on synthetic data showcase, in most of the cases, the superior performance of the proposed solution over several RL and non-RL schemes.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"72 ","pages":"5644-5658"},"PeriodicalIF":4.6000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10770147/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

This paper designs novel nonparametric Bellman mappings in reproducing kernel Hilbert spaces (RKHSs) for reinforcement learning (RL). The proposed mappings benefit from the rich approximating properties of RKHSs, adopt no assumptions on the statistics of the data owing to their nonparametric nature, require no knowledge on transition probabilities of Markov decision processes, and may operate without any training data. Moreover, they allow for sampling on-the-fly via the design of trajectory samples, re-use past test data via experience replay, effect dimensionality reduction by random Fourier features, and enable computationally lightweight operations to fit into efficient online or time-adaptive learning. The paper offers also a variational framework to design the free parameters of the proposed Bellman mappings, and shows that appropriate choices of those parameters yield several popular Bellman-mapping designs. As an application, the proposed mappings are employed to offer a novel solution to the problem of countering outliers in adaptive filtering. More specifically, with no prior information on the statistics of the outliers and no training data, a policy-iteration algorithm is introduced to select online, per time instance, the “optimal” coefficient

$p$

in the least-mean-

$p$

-power-error method. Numerical tests on synthetic data showcase, in most of the cases, the superior performance of the proposed solution over several RL and non-RL schemes.

查看原文本刊更多论文

强化学习的非参数Bellman映射：在鲁棒自适应滤波中的应用

本文设计了一种用于强化学习的核希尔伯特空间（RKHSs）再现的非参数Bellman映射。所提出的映射得益于RKHSs丰富的近似性质，由于其非参数性质，不需要对数据的统计进行假设，不需要了解马尔可夫决策过程的转移概率，并且可以在没有任何训练数据的情况下运行。此外，它们允许通过轨迹样本的设计进行实时采样，通过经验回放重用过去的测试数据，通过随机傅立叶特征降低效果维数，并使计算轻量级操作适合有效的在线或时间自适应学习。本文还提供了一个变分框架来设计所提出的Bellman映射的自由参数，并表明这些参数的适当选择产生了几种流行的Bellman映射设计。作为一种应用，所提出的映射为自适应滤波中对抗异常值的问题提供了一种新的解决方案。更具体地说，在没有先验异常值统计信息和没有训练数据的情况下，引入策略迭代算法，在最小均值-p -幂-误差方法中在线选择每个时间实例的“最优”系数p。综合数据的数值测试表明，在大多数情况下，所提出的解决方案优于几种RL和非RL方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Signal Processing 工程技术-工程：电子与电气

CiteScore

11.20

自引率

9.30%

发文量

310

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.