Embedded GPU Cluster Computing Framework for Inference of Convolutional Neural Networks

Evan T. Kain, Diego Wildenstein, A. Pineda
{"title":"Embedded GPU Cluster Computing Framework for Inference of Convolutional Neural Networks","authors":"Evan T. Kain, Diego Wildenstein, A. Pineda","doi":"10.1109/HPEC.2019.8916253","DOIUrl":null,"url":null,"abstract":"The growing need for on-board image processing for space vehicles requires computing solutions that are both low-power and high-performance. Parallel computation using low-power embedded Graphics Processing Units (GPUs) satisfy both requirements. Our experiment involves the use of OpenMPI domain decomposition of an image processing algorithm based upon a pre-trained convolutional neural network (CNN) developed by the U.S. Air Force Research Laboratory (AFRL). Our testbed consists of six NVIDIA Jetson TX2 development boards operating in parallel. This parallel framework results in a speedup of $4.3 \\times $ on six processing nodes. This approach also leads to a linear decay in parallel efficiency as more processing nodes are added to the network. By replicating the data across processors in addition to distributing, we also characterize the best-case impact of adding triple modular redundancy (TMR) to our application.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The growing need for on-board image processing for space vehicles requires computing solutions that are both low-power and high-performance. Parallel computation using low-power embedded Graphics Processing Units (GPUs) satisfy both requirements. Our experiment involves the use of OpenMPI domain decomposition of an image processing algorithm based upon a pre-trained convolutional neural network (CNN) developed by the U.S. Air Force Research Laboratory (AFRL). Our testbed consists of six NVIDIA Jetson TX2 development boards operating in parallel. This parallel framework results in a speedup of $4.3 \times $ on six processing nodes. This approach also leads to a linear decay in parallel efficiency as more processing nodes are added to the network. By replicating the data across processors in addition to distributing, we also characterize the best-case impact of adding triple modular redundancy (TMR) to our application.
卷积神经网络推理的嵌入式GPU集群计算框架
空间飞行器对机载图像处理的需求日益增长,需要低功耗和高性能的计算解决方案。使用低功耗嵌入式图形处理单元(gpu)进行并行计算可以满足这两种需求。我们的实验涉及使用基于美国空军研究实验室(AFRL)开发的预训练卷积神经网络(CNN)的图像处理算法的OpenMPI域分解。我们的测试平台由六块并行运行的NVIDIA Jetson TX2开发板组成。这个并行框架在6个处理节点上的速度提高了4.3倍。随着更多的处理节点被添加到网络中,这种方法也会导致并行效率的线性衰减。除了分布之外,通过跨处理器复制数据,我们还描述了在应用程序中添加三模冗余(TMR)的最佳影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信