E2HRL:一种高效节能的分层深度强化学习硬件加速器

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-02-24 DOI:10.1145/3498327

Aidin Shiri, Utteja Kallakuri, Hasib-Al Rashid, Bharat Prakash, Nicholas R. Waytowich, T. Oates, T. Mohsenin

{"title":"E2HRL:一种高效节能的分层深度强化学习硬件加速器","authors":"Aidin Shiri, Utteja Kallakuri, Hasib-Al Rashid, Bharat Prakash, Nicholas R. Waytowich, T. Oates, T. Mohsenin","doi":"10.1145/3498327","DOIUrl":null,"url":null,"abstract":"Recently, Reinforcement Learning (RL) has shown great performance in solving sequential decision-making and control in dynamic environment problems. Despite its achievements, deploying Deep Neural Network (DNN)-based RL is expensive in terms of time and power due to the large number of episodes required to train agents with high dimensional image representations. Additionally, at the interference the large energy footprint of deep neural networks can be a major drawback. Embedded edge devices as the main platform for deploying RL applications are intrinsically resource-constrained and deploying deep neural network-based RL on them is a challenging task. As a result, reducing the number of actions taken by the RL agent to learn desired policy, along with the energy-efficient deployment of RL, is crucial. In this article, we propose Energy Efficient Hierarchical Reinforcement Learning (E2HRL), which is a scalable hardware architecture for RL applications. E2HRL utilizes a cross-layer design methodology for achieving better energy efficiency, smaller model size, higher accuracy, and system integration at the software and hardware layers. Our proposed model for RL agent is designed based on the learning hierarchical policies, which makes the network architecture more efficient for implementation on mobile devices. We evaluated our model in three different RL environments with different level of complexity. Simulation results with our analysis illustrate that hierarchical policy learning with several levels of control improves RL agents training efficiency and the agent learns the desired policy faster compared to a non-hierarchical model. This improvement is specifically more observable as the environment or the task becomes more complex with multiple objective subgoals. We tested our model with different hyperparameters to achieve the maximum reward by the RL agent while minimizing the model size, parameters, and required number of operations. E2HRL model enables efficient deployment of RL agent on resource-constraint-embedded devices with the proposed custom hardware architecture that is scalable and fully parameterized with respect to the number of input channels, filter size, and depth. The number of processing engines (PE) in the proposed hardware can vary between 1 to 8, which provides the flexibility of tradeoff of different factors such as latency, throughput, power, and energy efficiency. By performing a systematic hardware parameter analysis and design space exploration, we implemented the most energy-efficient hardware architectures of E2HRL on Xilinx Artix-7 FPGA and NVIDIA Jetson TX2. Comparing the implementation results shows Jetson TX2 boards achieve 0.1 ∼ 1.3 GOP/S/W energy efficiency while Artix-7 FPGA achieves 1.1 ∼ 11.4 GOP/S/W, which denotes 8.8× ∼ 11× better energy efficiency of E2HRL when model is implemented on FPGA. Additionally, compared to similar works our design shows better performance and energy efficiency.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"77 1","pages":"1 - 19"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"E2HRL: An Energy-efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning\",\"authors\":\"Aidin Shiri, Utteja Kallakuri, Hasib-Al Rashid, Bharat Prakash, Nicholas R. Waytowich, T. Oates, T. Mohsenin\",\"doi\":\"10.1145/3498327\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, Reinforcement Learning (RL) has shown great performance in solving sequential decision-making and control in dynamic environment problems. Despite its achievements, deploying Deep Neural Network (DNN)-based RL is expensive in terms of time and power due to the large number of episodes required to train agents with high dimensional image representations. Additionally, at the interference the large energy footprint of deep neural networks can be a major drawback. Embedded edge devices as the main platform for deploying RL applications are intrinsically resource-constrained and deploying deep neural network-based RL on them is a challenging task. As a result, reducing the number of actions taken by the RL agent to learn desired policy, along with the energy-efficient deployment of RL, is crucial. In this article, we propose Energy Efficient Hierarchical Reinforcement Learning (E2HRL), which is a scalable hardware architecture for RL applications. E2HRL utilizes a cross-layer design methodology for achieving better energy efficiency, smaller model size, higher accuracy, and system integration at the software and hardware layers. Our proposed model for RL agent is designed based on the learning hierarchical policies, which makes the network architecture more efficient for implementation on mobile devices. We evaluated our model in three different RL environments with different level of complexity. Simulation results with our analysis illustrate that hierarchical policy learning with several levels of control improves RL agents training efficiency and the agent learns the desired policy faster compared to a non-hierarchical model. This improvement is specifically more observable as the environment or the task becomes more complex with multiple objective subgoals. We tested our model with different hyperparameters to achieve the maximum reward by the RL agent while minimizing the model size, parameters, and required number of operations. E2HRL model enables efficient deployment of RL agent on resource-constraint-embedded devices with the proposed custom hardware architecture that is scalable and fully parameterized with respect to the number of input channels, filter size, and depth. The number of processing engines (PE) in the proposed hardware can vary between 1 to 8, which provides the flexibility of tradeoff of different factors such as latency, throughput, power, and energy efficiency. By performing a systematic hardware parameter analysis and design space exploration, we implemented the most energy-efficient hardware architectures of E2HRL on Xilinx Artix-7 FPGA and NVIDIA Jetson TX2. Comparing the implementation results shows Jetson TX2 boards achieve 0.1 ∼ 1.3 GOP/S/W energy efficiency while Artix-7 FPGA achieves 1.1 ∼ 11.4 GOP/S/W, which denotes 8.8× ∼ 11× better energy efficiency of E2HRL when model is implemented on FPGA. Additionally, compared to similar works our design shows better performance and energy efficiency.\",\"PeriodicalId\":6933,\"journal\":{\"name\":\"ACM Transactions on Design Automation of Electronic Systems (TODAES)\",\"volume\":\"77 1\",\"pages\":\"1 - 19\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Design Automation of Electronic Systems (TODAES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3498327\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3498327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

近年来，强化学习(RL)在解决动态环境下的顺序决策和控制问题上表现出了良好的性能。尽管取得了成就，但部署基于深度神经网络(DNN)的强化学习在时间和功率方面都很昂贵，因为训练具有高维图像表示的智能体需要大量的剧集。此外，在干扰方面，深度神经网络的大能量足迹可能是一个主要缺点。嵌入式边缘设备作为部署强化学习应用的主要平台，其本质上是资源受限的，在其上部署基于深度神经网络的强化学习是一项具有挑战性的任务。因此，减少RL代理为学习所需策略而采取的行动数量，以及RL的节能部署至关重要。在本文中，我们提出了能效分层强化学习(E2HRL)，这是一种用于强化学习应用的可扩展硬件架构。E2HRL采用跨层设计方法，在软件和硬件层实现更好的能源效率、更小的模型尺寸、更高的精度和系统集成。我们提出的RL智能体模型是基于分层学习策略设计的，这使得网络架构更有效地在移动设备上实现。我们在三种不同复杂程度的强化学习环境中评估了我们的模型。我们分析的仿真结果表明，具有多级控制的分层策略学习提高了RL智能体的训练效率，并且与非分层模型相比，智能体学习所需策略的速度更快。当环境或任务变得更加复杂并带有多个目标子目标时，这种改进尤其容易观察到。我们用不同的超参数测试了我们的模型，以实现RL代理的最大奖励，同时最小化模型大小、参数和所需的操作数量。E2HRL模型支持在资源受限的嵌入式设备上高效部署RL代理，该模型采用了可扩展的定制硬件架构，并且在输入通道数量、过滤器大小和深度方面完全参数化。所建议的硬件中的处理引擎(PE)的数量可以在1到8之间变化，这提供了在延迟、吞吐量、功率和能效等不同因素之间进行权衡的灵活性。通过系统的硬件参数分析和设计空间探索，我们在Xilinx Artix-7 FPGA和NVIDIA Jetson TX2上实现了最节能的E2HRL硬件架构。对比实现结果，Jetson TX2板的能效为0.1 ~ 1.3 GOP/S/W，而Artix-7 FPGA的能效为1.1 ~ 11.4 GOP/S/W，在FPGA上实现模型时，E2HRL的能效提高8.8× ~ 11倍。此外，与同类作品相比，我们的设计表现出更好的性能和能源效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

E2HRL: An Energy-efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning

Recently, Reinforcement Learning (RL) has shown great performance in solving sequential decision-making and control in dynamic environment problems. Despite its achievements, deploying Deep Neural Network (DNN)-based RL is expensive in terms of time and power due to the large number of episodes required to train agents with high dimensional image representations. Additionally, at the interference the large energy footprint of deep neural networks can be a major drawback. Embedded edge devices as the main platform for deploying RL applications are intrinsically resource-constrained and deploying deep neural network-based RL on them is a challenging task. As a result, reducing the number of actions taken by the RL agent to learn desired policy, along with the energy-efficient deployment of RL, is crucial. In this article, we propose Energy Efficient Hierarchical Reinforcement Learning (E2HRL), which is a scalable hardware architecture for RL applications. E2HRL utilizes a cross-layer design methodology for achieving better energy efficiency, smaller model size, higher accuracy, and system integration at the software and hardware layers. Our proposed model for RL agent is designed based on the learning hierarchical policies, which makes the network architecture more efficient for implementation on mobile devices. We evaluated our model in three different RL environments with different level of complexity. Simulation results with our analysis illustrate that hierarchical policy learning with several levels of control improves RL agents training efficiency and the agent learns the desired policy faster compared to a non-hierarchical model. This improvement is specifically more observable as the environment or the task becomes more complex with multiple objective subgoals. We tested our model with different hyperparameters to achieve the maximum reward by the RL agent while minimizing the model size, parameters, and required number of operations. E2HRL model enables efficient deployment of RL agent on resource-constraint-embedded devices with the proposed custom hardware architecture that is scalable and fully parameterized with respect to the number of input channels, filter size, and depth. The number of processing engines (PE) in the proposed hardware can vary between 1 to 8, which provides the flexibility of tradeoff of different factors such as latency, throughput, power, and energy efficiency. By performing a systematic hardware parameter analysis and design space exploration, we implemented the most energy-efficient hardware architectures of E2HRL on Xilinx Artix-7 FPGA and NVIDIA Jetson TX2. Comparing the implementation results shows Jetson TX2 boards achieve 0.1 ∼ 1.3 GOP/S/W energy efficiency while Artix-7 FPGA achieves 1.1 ∼ 11.4 GOP/S/W, which denotes 8.8× ∼ 11× better energy efficiency of E2HRL when model is implemented on FPGA. Additionally, compared to similar works our design shows better performance and energy efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Design Automation of Electronic Systems (TODAES)

自引率

0.00%

发文量