Overcoming Deceptive Rewards with Quality-Diversity

Proceedings of the Companion Conference on Genetic and Evolutionary Computation Pub Date : 2023-07-15 DOI:10.1145/3583133.3590741

A. Feiden, J. Garcke

引用次数: 0

Abstract

Quality-Diversity offers powerful ideas to create diverse, high-performing populations. Here, we investigate the capabilities these ideas hold to solve exploration-hard single-objective problems, in addition to creating diverse high-performing populations. We find that MAP-Elites is well suited to overcome deceptive reward structures, while an Elites-type approach with an unstructured, distance based container and extinction events can even outperform it. Furthermore, we analyse how the QD score, the standard evaluation of MAP-Elites type algorithms, is not well suited to predict the success of a configuration in solving a maze. This shows that the exploration capacity is an entirely different dimension in which QD algorithms can be utilized, evaluated, and improved on. It is a dimension that does not currently seem to be covered, implicitly or explicitly, by the current advances in the field.

查看原文本刊更多论文

用质量多样性克服欺骗性奖励

质量多样性为创造多样化、高绩效的群体提供了强有力的思路。在这里，我们研究了这些想法在解决难以探索的单目标问题方面的能力，以及创造多样化的高绩效群体。我们发现MAP-Elites非常适合克服欺骗性的奖励结构，而带有非结构化、基于距离的容器和灭绝事件的精英型方法甚至可以优于它。此外，我们分析了QD分数(MAP-Elites类型算法的标准评价)如何不能很好地预测一个配置在解决迷宫中的成功。这表明，勘探能力是一个完全不同的维度，QD算法可以在其中得到利用、评估和改进。目前该领域的进展似乎并未含蓄或明确地涵盖这一方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Companion Conference on Genetic and Evolutionary Computation

自引率

0.00%

发文量