Gaurav Verma, M. Emani, C. Liao, Pei-Hung Lin, T. Vanderbruggen, Xipeng Shen, Barbara M. Chapman
{"title":"HPCFAIR: Enabling FAIR AI for HPC Applications","authors":"Gaurav Verma, M. Emani, C. Liao, Pei-Hung Lin, T. Vanderbruggen, Xipeng Shen, Barbara M. Chapman","doi":"10.1109/mlhpc54614.2021.00011","DOIUrl":"https://doi.org/10.1109/mlhpc54614.2021.00011","url":null,"abstract":"Artificial Intelligence (AI) is being adopted in different domains at an unprecedented scale. A significant interest in the scientific community also involves leveraging machine learning (ML) to effectively run high performance computing applications at scale. Given multiple efforts in this arena, there are often duplicated efforts when existing rich data sets and ML models could be leveraged instead. The primary challenge is a lack of an ecosystem to reuse and reproduce the models and datasets. In this work, we propose HPCFAIR, a modular, extensible framework to enable AI models to be Findable, Accessible, Interoperable and Reproducible (FAIR). It enables users with a structured approach to search, load, save and reuse the models in their codes. We present the design, implementation of our framework and highlight how it can be seamlessly integrated to ML-driven applications for high performance computing applications and scientific machine learning workloads.","PeriodicalId":101642,"journal":{"name":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116961215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Brewer, Daniel Martínez, Mathew Boyer, D. Jude, A. Wissink, Ben Parsons, Junqi Yin, Valentine Anantharaj
{"title":"Production Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC","authors":"W. Brewer, Daniel Martínez, Mathew Boyer, D. Jude, A. Wissink, Ben Parsons, Junqi Yin, Valentine Anantharaj","doi":"10.1109/mlhpc54614.2021.00008","DOIUrl":"https://doi.org/10.1109/mlhpc54614.2021.00008","url":null,"abstract":"We explore how to optimally deploy several different types of machine-learned surrogate models used in rotorcraft aerodynamics on HPC. We first developed three different rotorcraft models at three different orders of magnitude (2M, 44M, and 212M trainable parameters) to use as test models. Then we developed a benchmark, which we call “smiBench”, that uses synthetic data to test a wide range of alternative configurations to study optimal deployment scenarios. We discovered several different types of optimal deployment scenarios depending on the model size and inference frequency. For most cases, it makes sense to use multiple inference servers, each bound to a GPU with a load balancer distributing the requests across multiple GPUs. We tested three different types of inference server deployments: (1) a custom Flask-based HTTP inference server, (2) TensorFlow Serving with gRPC protocol, and (3) RedisAI server with SmartRedis clients using the RESP protocol. We also tested three different types of load balancing techniques for multi-GPU inferencing: (1) Python concurrent.futures thread pool, (2) HAProxy, and (3) mpi4py. We investigated deployments on both DoD HPCMP’s SCOUT and DoE OLCF’s Summit POWER9 supercomputers, demonstrated the ability to inference a million samples per second using 192 GPUs, and studied multiple scenarios on both Nvidia T4 and V100 GPUs. Moreover, we studied a range of concurrency levels, both on the client-side and the server-side, and provide optimal configuration advice based on the type of deployment. Finally, we provide a simple Python-based framework for benchmarking machine-learned surrogate models using the various inference servers.","PeriodicalId":101642,"journal":{"name":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132150135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Liao, Pei-Hung Lin, Gaurav Verma, T. Vanderbruggen, M. Emani, Zifan Nan, Xipeng Shen
{"title":"HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing","authors":"C. Liao, Pei-Hung Lin, Gaurav Verma, T. Vanderbruggen, M. Emani, Zifan Nan, Xipeng Shen","doi":"10.1109/mlhpc54614.2021.00012","DOIUrl":"https://doi.org/10.1109/mlhpc54614.2021.00012","url":null,"abstract":"Machine learning (ML) techniques have been widely studied to address various challenges of productively and efficiently running large-scale scientific applications on heterogeneous supercomputers. However, it is extremely difficult to generate, access, and maintain training datasets and AI models to accelerate ML-based research. The Future of Research Communications and e-Scholarship has proposed the FAIR data principles describing Findability, Accessibility, Interoperability, and Reusability. In this paper, we present our ongoing work of designing an ontology for high-performance computing (named HPC ontology) in order to make training datasets and AI models FAIR. Our ontology provides controlled vocabularies, explicit semantics, and formal knowledge representations. Our design uses an extensible two-level pattern, capturing both high-level meta information and low-level data content for software, hardware, experiments, workflows, training datasets, AI models, and so on. Preliminary evaluation shows that HPC ontology is effective to annotate selected data and support a set of SPARQL queries.","PeriodicalId":101642,"journal":{"name":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114149942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael R. Wyatt, Valen Yamamoto, Zoë Tosi, I. Karlin, B. V. Essen
{"title":"Is Disaggregation possible for HPC Cognitive Simulation?","authors":"Michael R. Wyatt, Valen Yamamoto, Zoë Tosi, I. Karlin, B. V. Essen","doi":"10.1109/mlhpc54614.2021.00014","DOIUrl":"https://doi.org/10.1109/mlhpc54614.2021.00014","url":null,"abstract":"Cognitive simulation (CogSim) is an important and emerging workflow for HPC scientific exploration and scientific machine learning (SciML). One challenging workload for CogSim is the replacement of one component in a complex physical simulation with a fast, learned, surrogate model that is “inside” of the computational loop. The execution of this in-the-loop inference is particularly challenging because it requires frequent inference across multiple possible target models, can be on the simulation’s critical path (latency bound), is subject to requests from multiple MPI ranks, and typically contains a small number of samples per request. In this paper we explore the use of large, dedicated Deep Learning / AI accelerators that are disaggregated from compute nodes for this CogSim workload. We compare the trade-offs of using these accelerators versus the node-local GPU accelerators on leadership-class HPC systems.","PeriodicalId":101642,"journal":{"name":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126231710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic-Aware Lossless Data Compression for Deep Learning Recommendation Model (DLRM)","authors":"S. Pumma, Abhinav Vishnu","doi":"10.1109/mlhpc54614.2021.00006","DOIUrl":"https://doi.org/10.1109/mlhpc54614.2021.00006","url":null,"abstract":"As the architectures and capabilities of deep neural networks evolve, they become more sophisticated to train and use. Deep Learning Recommendation Model (DLRM), a new neural network for recommendation systems, introduces challenging requirements for deep neural network training and inference. The size of the DLRM model is typically large and not able to fit on a single GPU memory. Unlike other deep neural networks, DLRM requires both model-parallel and data-parallel for the bottom part and top part of the model when running on multiple GPUs. Due to the hybrid-parallel model, the all-to-all communication is used for welding the top and bottom parts together. We have observed that the all-to-all communication is costly and is a bottleneck in the DLRM training/inference. In this paper, we propose a novel approach to reduce the communication volume by using DLRM’s properties to compress the transferred data without information loss. We demonstrate benefits of our method by training DLRM MLPerf on eight AMD Instinc$mathrm{t}^{mathrm{T}mathrm{M}}$ MI100 accelerators. The experimental results show 59% and 38% improvement in the time-to-solution of the DLRM MLPerf training for FP32 and mixed-precision, respectively.","PeriodicalId":101642,"journal":{"name":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132182833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[Copyright notice]","authors":"","doi":"10.1109/mlhpc54614.2021.00002","DOIUrl":"https://doi.org/10.1109/mlhpc54614.2021.00002","url":null,"abstract":"","PeriodicalId":101642,"journal":{"name":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124283753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Logan T. Ward, G. Sivaraman, J. G. Pauloski, Y. Babuji, Ryan Chard, Naveen K. Dandu, P. Redfern, R. Assary, K. Chard, L. Curtiss, R. Thakur, Ian T. Foster
{"title":"Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing","authors":"Logan T. Ward, G. Sivaraman, J. G. Pauloski, Y. Babuji, Ryan Chard, Naveen K. Dandu, P. Redfern, R. Assary, K. Chard, L. Curtiss, R. Thakur, Ian T. Foster","doi":"10.1109/MLHPC54614.2021.00007","DOIUrl":"https://doi.org/10.1109/MLHPC54614.2021.00007","url":null,"abstract":"Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate dynamic mixes of simulation and learning tasks. We present Colmena, an open-source Python framework that allows users to steer campaigns by providing just the implementations of individual tasks plus the logic used to choose which tasks to execute when. Colmena handles task dispatch, results collation, ML model invocation, and ML model (re)training, using Parsl to execute tasks on HPC systems. We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches.","PeriodicalId":101642,"journal":{"name":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128144975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vincent Dumont, Casey Garner, Anuradha Trivedi, Chelsea Jones, V. Ganapati, Juliane Mueller, T. Perciano, M. Kiran, Marcus Day
{"title":"HYPPO: A Surrogate-Based Multi-Level Parallelism Tool for Hyperparameter Optimization","authors":"Vincent Dumont, Casey Garner, Anuradha Trivedi, Chelsea Jones, V. Ganapati, Juliane Mueller, T. Perciano, M. Kiran, Marcus Day","doi":"10.1109/MLHPC54614.2021.00013","DOIUrl":"https://doi.org/10.1109/MLHPC54614.2021.00013","url":null,"abstract":"We present a new software, HYPPO, that enables the automatic tuning of hyperparameters of various deep learning (DL) models. Unlike other hyperparameter optimization (HPO) methods, HYPPO uses adaptive surrogate models and directly accounts for uncertainty in model predictions to find accurate and reliable models that make robust predictions. Using asynchronous nested parallelism, we are able to significantly alleviate the computational burden of training complex architectures and quantifying the uncertainty. HYPPO is implemented in Python and can be used with both TensorFlow and PyTorch libraries. We demonstrate various software features on time-series prediction and image classification problems as well as a scientific application in computed tomography image reconstruction. Finally, we show that (1) we can reduce by an order of magnitude the number of evaluations necessary to find the most optimal region in the hyperparameter space and (2) we can reduce by two orders of magnitude the throughput for such HPO process to complete.","PeriodicalId":101642,"journal":{"name":"2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124355225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}