一种基于强化学习的图结构连续估计分布算法

2012 IEEE Congress on Evolutionary Computation Pub Date : 2012-06-10 DOI:10.1109/CEC.2012.6256481

Xianneng Li, Bing Li, S. Mabu, K. Hirasawa

{"title":"一种基于强化学习的图结构连续估计分布算法","authors":"Xianneng Li, Bing Li, S. Mabu, K. Hirasawa","doi":"10.1109/CEC.2012.6256481","DOIUrl":null,"url":null,"abstract":"A novel graph-based Estimation of Distribution Algorithm (EDA) named Probabilistic Model Building Genetic Network Programming (PMBGNP) has been proposed. Inspired by classical EDAs, PMBGNP memorizes the current best individuals and uses them to estimate a distribution for the generation of the new population. However, PMBGNP can evolve compact programs by representing its solutions as graph structures. Therefore, it can solve a range of problems different from conventional ones in EDA literature, such as data mining and Reinforcement Learning (RL) problems. This paper extends PMBGNP from discrete to continuous search space, which is named PMBGNP-AC. Besides evolving the node connections to determine the optimal graph structures using conventional PMBGNP, Gaussian distribution is used for the distribution of continuous variables of nodes. The mean value μ and standard deviation σ are constructed like those of classical continuous Population-based incremental learning (PBILc). However, a RL technique, i.e., Actor-Critic (AC), is designed to update the parameters (μ and σ). AC allows us to calculate the Temporal-Difference (TD) error to evaluate whether the selection of the continuous value is better or worse than expected. This scalar reinforcement signal can decide whether the tendency to select this continuous value should be strengthened or weakened, allowing us to determine the shape of the probability density functions of the Gaussian distribution. The proposed algorithm is applied to a RL problem, i.e., autonomous robot control, where the robot's wheel speeds and sensor values are continuous. The experimental results show the superiority of PMBGNP-AC comparing with the conventional algorithms.","PeriodicalId":376837,"journal":{"name":"2012 IEEE Congress on Evolutionary Computation","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A continuous estimation of distribution algorithm by evolving graph structures using reinforcement learning\",\"authors\":\"Xianneng Li, Bing Li, S. Mabu, K. Hirasawa\",\"doi\":\"10.1109/CEC.2012.6256481\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel graph-based Estimation of Distribution Algorithm (EDA) named Probabilistic Model Building Genetic Network Programming (PMBGNP) has been proposed. Inspired by classical EDAs, PMBGNP memorizes the current best individuals and uses them to estimate a distribution for the generation of the new population. However, PMBGNP can evolve compact programs by representing its solutions as graph structures. Therefore, it can solve a range of problems different from conventional ones in EDA literature, such as data mining and Reinforcement Learning (RL) problems. This paper extends PMBGNP from discrete to continuous search space, which is named PMBGNP-AC. Besides evolving the node connections to determine the optimal graph structures using conventional PMBGNP, Gaussian distribution is used for the distribution of continuous variables of nodes. The mean value μ and standard deviation σ are constructed like those of classical continuous Population-based incremental learning (PBILc). However, a RL technique, i.e., Actor-Critic (AC), is designed to update the parameters (μ and σ). AC allows us to calculate the Temporal-Difference (TD) error to evaluate whether the selection of the continuous value is better or worse than expected. This scalar reinforcement signal can decide whether the tendency to select this continuous value should be strengthened or weakened, allowing us to determine the shape of the probability density functions of the Gaussian distribution. The proposed algorithm is applied to a RL problem, i.e., autonomous robot control, where the robot's wheel speeds and sensor values are continuous. The experimental results show the superiority of PMBGNP-AC comparing with the conventional algorithms.\",\"PeriodicalId\":376837,\"journal\":{\"name\":\"2012 IEEE Congress on Evolutionary Computation\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Congress on Evolutionary Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEC.2012.6256481\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Congress on Evolutionary Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC.2012.6256481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

提出了一种新的基于图的分布估计算法——概率模型构建遗传网络规划(PMBGNP)。受经典eda的启发，PMBGNP记住当前最好的个体，并使用它们来估计新种群生成的分布。然而，PMBGNP可以通过将其解表示为图结构来发展紧凑的程序。因此，它可以解决一系列不同于EDA文献中的传统问题，如数据挖掘和强化学习(RL)问题。本文将PMBGNP从离散搜索空间扩展到连续搜索空间，命名为PMBGNP- ac。除了使用传统PMBGNP进化节点连接以确定最优图结构外，还使用高斯分布用于节点连续变量的分布。均值μ和标准差σ的构造与经典的基于连续种群的增量学习(PBILc)相似。然而，RL技术，即Actor-Critic (AC)，被设计用来更新参数(μ和σ)。AC允许我们计算时间差(TD)误差来评估连续值的选择是比预期的好还是差。这个标量强化信号可以决定选择这个连续值的趋势是应该加强还是减弱，从而使我们能够确定高斯分布的概率密度函数的形状。该算法应用于RL问题，即机器人的自主控制，其中机器人的轮速和传感器值是连续的。实验结果表明了PMBGNP-AC算法与传统算法相比的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A continuous estimation of distribution algorithm by evolving graph structures using reinforcement learning

A novel graph-based Estimation of Distribution Algorithm (EDA) named Probabilistic Model Building Genetic Network Programming (PMBGNP) has been proposed. Inspired by classical EDAs, PMBGNP memorizes the current best individuals and uses them to estimate a distribution for the generation of the new population. However, PMBGNP can evolve compact programs by representing its solutions as graph structures. Therefore, it can solve a range of problems different from conventional ones in EDA literature, such as data mining and Reinforcement Learning (RL) problems. This paper extends PMBGNP from discrete to continuous search space, which is named PMBGNP-AC. Besides evolving the node connections to determine the optimal graph structures using conventional PMBGNP, Gaussian distribution is used for the distribution of continuous variables of nodes. The mean value μ and standard deviation σ are constructed like those of classical continuous Population-based incremental learning (PBILc). However, a RL technique, i.e., Actor-Critic (AC), is designed to update the parameters (μ and σ). AC allows us to calculate the Temporal-Difference (TD) error to evaluate whether the selection of the continuous value is better or worse than expected. This scalar reinforcement signal can decide whether the tendency to select this continuous value should be strengthened or weakened, allowing us to determine the shape of the probability density functions of the Gaussian distribution. The proposed algorithm is applied to a RL problem, i.e., autonomous robot control, where the robot's wheel speeds and sensor values are continuous. The experimental results show the superiority of PMBGNP-AC comparing with the conventional algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE Congress on Evolutionary Computation

自引率

0.00%

发文量