Value Iteration on Multicore Processors

2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) Pub Date : 2020-12-09 DOI:10.1109/ISSPIT51521.2020.9408773

Anuj K. Jain, S. Sahni

引用次数: 0

Abstract

Value Iteration (VI) is a powerful, though time consuming, approach to solve reinforcement learning problems modeled as Markov Decision Processes (MDPs). In this paper, we explore strategies to run the sate-of-the-art cache efficient algorithm for VI developed by us [1], [2] on a multicore processor. We demonstrate a speedup of up to 2.59 on a 10-core multiprocessor using 20 threads on popular benchmark data. The speedup for the parallelized portion of the computation is up to 5.89.

查看原文本刊更多论文

多核处理器上的值迭代

值迭代(VI)是一种强大但耗时的方法，用于解决以马尔可夫决策过程(mdp)为模型的强化学习问题。在本文中，我们探讨了在多核处理器上运行由我们[1]，[2]开发的最先进的VI缓存高效算法的策略。我们在流行的基准测试数据上使用20个线程，在10核多处理器上演示了高达2.59的加速。计算的并行化部分的加速高达5.89。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

自引率

0.00%

发文量