Accelerating Monte-Carlo Tree Search on CPU-FPGA Heterogeneous Platform

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI:10.1109/FPL57034.2022.00037

Yuan Meng, R. Kannan, V. Prasanna

引用次数: 3

Abstract

Monte Carlo Tree Search (MCTS) methods have achieved great success in many Artificial Intelligence (AI) benchmarks. The in-tree operations become a critical performance bottleneck in realizing parallel MCTS on CPUs. In this work, we develop a scalable CPU-FPGA system for Tree-Parallel MCTS. We propose a novel decomposition and mapping of MCTS data structure and computation onto CPU and FPGA to reduce communication and coordination. High scalability of our system is achieved by encapsulating in-tree operations in an SRAM-based FPGA accelerator. To lower the high data access latency and inter-worker synchronization overheads, we develop several hardware optimizations. We show that by using our accelerator, we obtain up to 35× speedup for in-tree operations, and 3× higher overall system throughput. Our CPU-FPGA system also achieves superior scalability wrt number of parallel workers than state-of-the-art parallel MCTS implementations on CPU.

查看原文本刊更多论文

CPU-FPGA异构平台上加速蒙特卡罗树搜索

蒙特卡洛树搜索(MCTS)方法在许多人工智能(AI)基准测试中取得了巨大的成功。树内操作成为在cpu上实现并行MCTS的关键性能瓶颈。在这项工作中，我们开发了一个可扩展的CPU-FPGA系统，用于树并行MCTS。我们提出了一种新的MCTS数据结构和计算的分解和映射到CPU和FPGA上，以减少通信和协调。通过将树内操作封装在基于sram的FPGA加速器中，实现了系统的高可扩展性。为了降低高数据访问延迟和工作者间同步开销，我们开发了几个硬件优化。我们表明，通过使用我们的加速器，我们可以获得高达35倍的树内操作加速，以及3倍的整体系统吞吐量。我们的CPU- fpga系统在并行工作人员数量方面也比最先进的CPU并行MCTS实现具有更高的可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)

自引率

0.00%

发文量