Production Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC

W. Brewer, Daniel Martínez, Mathew Boyer, D. Jude, A. Wissink, Ben Parsons, Junqi Yin, Valentine Anantharaj
{"title":"Production Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC","authors":"W. Brewer, Daniel Martínez, Mathew Boyer, D. Jude, A. Wissink, Ben Parsons, Junqi Yin, Valentine Anantharaj","doi":"10.1109/mlhpc54614.2021.00008","DOIUrl":null,"url":null,"abstract":"We explore how to optimally deploy several different types of machine-learned surrogate models used in rotorcraft aerodynamics on HPC. We first developed three different rotorcraft models at three different orders of magnitude (2M, 44M, and 212M trainable parameters) to use as test models. Then we developed a benchmark, which we call “smiBench”, that uses synthetic data to test a wide range of alternative configurations to study optimal deployment scenarios. We discovered several different types of optimal deployment scenarios depending on the model size and inference frequency. For most cases, it makes sense to use multiple inference servers, each bound to a GPU with a load balancer distributing the requests across multiple GPUs. We tested three different types of inference server deployments: (1) a custom Flask-based HTTP inference server, (2) TensorFlow Serving with gRPC protocol, and (3) RedisAI server with SmartRedis clients using the RESP protocol. We also tested three different types of load balancing techniques for multi-GPU inferencing: (1) Python concurrent.futures thread pool, (2) HAProxy, and (3) mpi4py. We investigated deployments on both DoD HPCMP’s SCOUT and DoE OLCF’s Summit POWER9 supercomputers, demonstrated the ability to inference a million samples per second using 192 GPUs, and studied multiple scenarios on both Nvidia T4 and V100 GPUs. Moreover, we studied a range of concurrency levels, both on the client-side and the server-side, and provide optimal configuration advice based on the type of deployment. Finally, we provide a simple Python-based framework for benchmarking machine-learned surrogate models using the various inference servers.","PeriodicalId":101642,"journal":{"name":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlhpc54614.2021.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

We explore how to optimally deploy several different types of machine-learned surrogate models used in rotorcraft aerodynamics on HPC. We first developed three different rotorcraft models at three different orders of magnitude (2M, 44M, and 212M trainable parameters) to use as test models. Then we developed a benchmark, which we call “smiBench”, that uses synthetic data to test a wide range of alternative configurations to study optimal deployment scenarios. We discovered several different types of optimal deployment scenarios depending on the model size and inference frequency. For most cases, it makes sense to use multiple inference servers, each bound to a GPU with a load balancer distributing the requests across multiple GPUs. We tested three different types of inference server deployments: (1) a custom Flask-based HTTP inference server, (2) TensorFlow Serving with gRPC protocol, and (3) RedisAI server with SmartRedis clients using the RESP protocol. We also tested three different types of load balancing techniques for multi-GPU inferencing: (1) Python concurrent.futures thread pool, (2) HAProxy, and (3) mpi4py. We investigated deployments on both DoD HPCMP’s SCOUT and DoE OLCF’s Summit POWER9 supercomputers, demonstrated the ability to inference a million samples per second using 192 GPUs, and studied multiple scenarios on both Nvidia T4 and V100 GPUs. Moreover, we studied a range of concurrency levels, both on the client-side and the server-side, and provide optimal configuration advice based on the type of deployment. Finally, we provide a simple Python-based framework for benchmarking machine-learned surrogate models using the various inference servers.
基于HPC的机器学习旋翼机代理模型生产部署
我们探讨了如何优化部署几种不同类型的机器学习代理模型用于旋翼机空气动力学在高性能计算。我们首先在三个不同的数量级(2M, 44M和212M可训练参数)开发了三种不同的旋翼飞机模型作为测试模型。然后我们开发了一个基准,我们称之为“smiBench”,它使用合成数据来测试各种可选配置,以研究最佳部署场景。根据模型大小和推理频率,我们发现了几种不同类型的最佳部署场景。对于大多数情况,使用多个推理服务器是有意义的,每个服务器绑定到一个GPU,负载平衡器跨多个GPU分发请求。我们测试了三种不同类型的推理服务器部署:(1)基于flask的自定义HTTP推理服务器,(2)使用gRPC协议的TensorFlow服务,以及(3)使用RESP协议的带有SmartRedis客户端的RedisAI服务器。我们还测试了用于多gpu推理的三种不同类型的负载平衡技术:(1)Python并发。(2) HAProxy, (3) mpi4py。我们调查了部署在DoD HPCMP的SCOUT和DoE OLCF的Summit POWER9超级计算机上的部署,展示了使用192个gpu每秒推断100万个样本的能力,并研究了Nvidia T4和V100 gpu上的多种场景。此外,我们还研究了客户端和服务器端的一系列并发级别,并根据部署类型提供了最佳配置建议。最后,我们提供了一个简单的基于python的框架,用于使用各种推理服务器对机器学习代理模型进行基准测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信