Embedded GPU Cluster Computing Framework for Inference of Convolutional Neural Networks

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI:10.1109/HPEC.2019.8916253

Evan T. Kain, Diego Wildenstein, A. Pineda

引用次数: 2

Abstract

The growing need for on-board image processing for space vehicles requires computing solutions that are both low-power and high-performance. Parallel computation using low-power embedded Graphics Processing Units (GPUs) satisfy both requirements. Our experiment involves the use of OpenMPI domain decomposition of an image processing algorithm based upon a pre-trained convolutional neural network (CNN) developed by the U.S. Air Force Research Laboratory (AFRL). Our testbed consists of six NVIDIA Jetson TX2 development boards operating in parallel. This parallel framework results in a speedup of $4.3 \times $ on six processing nodes. This approach also leads to a linear decay in parallel efficiency as more processing nodes are added to the network. By replicating the data across processors in addition to distributing, we also characterize the best-case impact of adding triple modular redundancy (TMR) to our application.

查看原文本刊更多论文

卷积神经网络推理的嵌入式GPU集群计算框架

空间飞行器对机载图像处理的需求日益增长，需要低功耗和高性能的计算解决方案。使用低功耗嵌入式图形处理单元(gpu)进行并行计算可以满足这两种需求。我们的实验涉及使用基于美国空军研究实验室(AFRL)开发的预训练卷积神经网络(CNN)的图像处理算法的OpenMPI域分解。我们的测试平台由六块并行运行的NVIDIA Jetson TX2开发板组成。这个并行框架在6个处理节点上的速度提高了4.3倍。随着更多的处理节点被添加到网络中，这种方法也会导致并行效率的线性衰减。除了分布之外，通过跨处理器复制数据，我们还描述了在应用程序中添加三模冗余(TMR)的最佳影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量