基于改进NSGA2的快速MCVI

Yin Liu, Yingping Zhou, Shuai Chen
{"title":"基于改进NSGA2的快速MCVI","authors":"Yin Liu, Yingping Zhou, Shuai Chen","doi":"10.1109/IHMSC.2014.38","DOIUrl":null,"url":null,"abstract":"Nowadays, the partially observable Markov decision processes (POMDPs) is widely used in many fields. The solutions to POMDP suffer from prohibitive computational complexity due to curse of dimensionality, but MCVI for POMDP is envisioned as a promising approach to break the curse. Although MCVI is a great breakthrough toward solving this problem, it still has some defects, such as the slow convergence rate and the continuous growth of nodes' number of policy graph. To this end, the purpose of this paper is to provide a fast MCVI based on improved NSGA2. Different from the general NSGA2, the improved NSGA2 initializes the population by experiential knowledge and uses a self-adjustable value as the probability of cross and mutation. Before executing the MCVI, the algorithm will set a series of thresholds. When the algorithm gets a temporary policy graph which reaches one of the thresholds, it will use a discount operator to update the threshold and use the improved NSGA2 to update policy graph. After that, the algorithm will execute the MCVI again and repeat this process until the end. Numerical experiments show that the fast MCVI achieves about 8% increase in convergence rate over original MCVI, and about 60% decrease in nodes' number of policy graph, for the classic problem of corridor.","PeriodicalId":370654,"journal":{"name":"2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast MCVI Based on Improved NSGA2\",\"authors\":\"Yin Liu, Yingping Zhou, Shuai Chen\",\"doi\":\"10.1109/IHMSC.2014.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, the partially observable Markov decision processes (POMDPs) is widely used in many fields. The solutions to POMDP suffer from prohibitive computational complexity due to curse of dimensionality, but MCVI for POMDP is envisioned as a promising approach to break the curse. Although MCVI is a great breakthrough toward solving this problem, it still has some defects, such as the slow convergence rate and the continuous growth of nodes' number of policy graph. To this end, the purpose of this paper is to provide a fast MCVI based on improved NSGA2. Different from the general NSGA2, the improved NSGA2 initializes the population by experiential knowledge and uses a self-adjustable value as the probability of cross and mutation. Before executing the MCVI, the algorithm will set a series of thresholds. When the algorithm gets a temporary policy graph which reaches one of the thresholds, it will use a discount operator to update the threshold and use the improved NSGA2 to update policy graph. After that, the algorithm will execute the MCVI again and repeat this process until the end. Numerical experiments show that the fast MCVI achieves about 8% increase in convergence rate over original MCVI, and about 60% decrease in nodes' number of policy graph, for the classic problem of corridor.\",\"PeriodicalId\":370654,\"journal\":{\"name\":\"2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHMSC.2014.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHMSC.2014.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目前,部分可观察马尔可夫决策过程在许多领域得到了广泛的应用。由于维度的诅咒,POMDP的解决方案的计算复杂性令人望而却步,但POMDP的MCVI被认为是一种有希望打破诅咒的方法。虽然MCVI在解决这一问题上取得了很大的突破,但它仍然存在一些缺陷,如收敛速度慢,策略图节点数量持续增长。为此,本文的目的是提供一种基于改进NSGA2的快速MCVI。与一般NSGA2不同的是,改进的NSGA2通过经验知识初始化种群,并使用一个自调值作为交叉和突变的概率。在执行MCVI之前,算法会设置一系列阈值。当算法得到的临时策略图达到某个阈值时,使用折扣算子更新阈值,并使用改进的NSGA2更新策略图。之后,算法将再次执行MCVI并重复此过程,直到结束。数值实验表明,对于经典的廊道问题,快速MCVI算法的收敛速度比原MCVI算法提高了8%左右,策略图节点数减少了60%左右。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fast MCVI Based on Improved NSGA2
Nowadays, the partially observable Markov decision processes (POMDPs) is widely used in many fields. The solutions to POMDP suffer from prohibitive computational complexity due to curse of dimensionality, but MCVI for POMDP is envisioned as a promising approach to break the curse. Although MCVI is a great breakthrough toward solving this problem, it still has some defects, such as the slow convergence rate and the continuous growth of nodes' number of policy graph. To this end, the purpose of this paper is to provide a fast MCVI based on improved NSGA2. Different from the general NSGA2, the improved NSGA2 initializes the population by experiential knowledge and uses a self-adjustable value as the probability of cross and mutation. Before executing the MCVI, the algorithm will set a series of thresholds. When the algorithm gets a temporary policy graph which reaches one of the thresholds, it will use a discount operator to update the threshold and use the improved NSGA2 to update policy graph. After that, the algorithm will execute the MCVI again and repeat this process until the end. Numerical experiments show that the fast MCVI achieves about 8% increase in convergence rate over original MCVI, and about 60% decrease in nodes' number of policy graph, for the classic problem of corridor.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信