2020 IEEE High Performance Extreme Computing Conference (HPEC)最新文献_第10页

Towards an Objective Metric for the Performance of Exact Triangle Count 探讨精确三角计数性能的客观度量

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-16 DOI: 10.1109/HPEC43674.2020.9286188

Mark P. Blanco, Scott McMillan, Tze Meng Low

引用次数: 1

Using Graphlet Spectrograms for Temporal Pattern Analysis of Virus-Research Collaboration Networks 用石墨烯谱图分析病毒研究合作网络的时间模式

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-01 DOI: 10.1109/HPEC43674.2020.9286161

D. Floros, Tiancheng Liu, N. Pitsianis, Xiaobai Sun

引用次数: 2

Survey of Machine Learning Accelerators

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-01 DOI: 10.1109/HPEC43674.2020.9286149

A. Reuther, P. Michaleas, Michael Jones, V. Gadepally, S. Samsi, J. Kepner

引用次数: 96

Homomorphic Encryption for Quantum Annealing with Spin Reversal Transformations 自旋反转量子退火的同态加密

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-08-31 DOI: 10.1109/HPEC43674.2020.9286176

D. O’Malley, John K. Golden

{"title":"Homomorphic Encryption for Quantum Annealing with Spin Reversal Transformations","authors":"D. O’Malley, John K. Golden","doi":"10.1109/HPEC43674.2020.9286176","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286176","url":null,"abstract":"Homomorphic encryption has been an area of study in classical computing for decades. The fundamental goal of homomorphic encryption is to enable (untrusted) Oscar to perform a computation for Alice without Oscar knowing the input to the computation or the output from the computation. Alice encrypts the input before sending it to Oscar, and Oscar performs the computation directly on the encrypted data, producing an encrypted result. Oscar then sends the encrypted result of the computation back to Alice, who can decrypt it. We describe an approach to homomorphic encryption for quantum annealing based on spin reversal transformations and show that it comes with little or no performance penalty. This is in contrast to approaches to homomorphic encryption for classical computing, which incur a significant additional computational cost. This implies that the performance gap between quantum annealing and classical computing is reduced when both paradigms use homomorphic encryption. Further, homomorphic encryption is critical for quantum annealing because quantum annealers are native to the cloud - a third party (such as untrusted Oscar) performs the computation. If sensitive information, such as health-related data subject to the Health Insurance Portability and Accountability Act, is to be processed with quantum annealers, such a technique could be useful.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125882551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimising AI Training Deployments using Graph Compilers and Containers 使用图编译器和容器优化AI训练部署

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-08-26 DOI: 10.1109/HPEC43674.2020.9286153

Nina Mujkanovic, K. Sivalingam, A. Lazzaro

{"title":"Optimising AI Training Deployments using Graph Compilers and Containers","authors":"Nina Mujkanovic, K. Sivalingam, A. Lazzaro","doi":"10.1109/HPEC43674.2020.9286153","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286153","url":null,"abstract":"Artificial Intelligence (AI) applications based on Deep Neural Networks (DNN) or Deep Learning (DL) have become popular due to their success in solving problems like image analysis and speech recognition. Training a DNN is computationally intensive and High Performance Computing (HPC) has been a key driver in AI growth. Virtualisation and container technology have led to the convergence of cloud and HPC infrastructure. These infrastructures with diverse hardware increase the complexity of deploying and optimising AI training workloads. AI training deployments in HPC or cloud can be optimised with target-specific libraries, graph compilers, and by improving data movement or IO. Graph compilers aim to optimise the execution of a DNN graph by generating an optimised code for a target hardware/backend. As part of SODALITE (a Horizon 2020 project), MODAK tool is developed to optimise application deployment in software defined infrastructures. Using input from the data scientist and performance modelling, MODAK maps optimal application parameters to a target infrastructure and builds an optimised container. In this paper, we introduce MODAK and review container technologies and graph compilers for AI. We illustrate optimisation of AI training deployments using graph compilers and Singularity containers. Evaluation using MNIST-CNN and ResNet50 training workloads shows that custom built optimised containers outperform the official images from DockerHub. We also found that the performance of graph compilers depends on the target hardware and the complexity of the neural network.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"290 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115601837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Accuracy and Performance Comparison of Video Action Recognition Approaches 视频动作识别方法的准确率和性能比较

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-08-20 DOI: 10.1109/HPEC43674.2020.9286249

Matthew Hutchinson, S. Samsi, W. Arcand, David Bestor, Bill Bergeron, C. Byun, Micheal Houle, M. Hubbell, Michael J. Jones, J. Kepner, Andrew Kirby, P. Michaleas, Lauren Milechin, J. Mullen, Andrew Prout, Antonio Rosa, A. Reuther, Charles Yee, V. Gadepally

引用次数: 4

Compute, Time and Energy Characterization of Encoder-Decoder Networks with Automatic Mixed Precision Training 基于自动混合精度训练的编码器-解码器网络的计算、时间和能量表征

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-08-18 DOI: 10.1109/HPEC43674.2020.9286241

S. Samsi, Michael Jones, M. Veillette

引用次数: 2

Benchmarking network fabrics for data distributed training of deep neural networks 面向深度神经网络数据分布式训练的基准网络结构

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-08-18 DOI: 10.1109/HPEC43674.2020.9286232

S. Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill Arcand, Bill Bergeron, David Bestor, C. Byun, V. Gadepally, Michael Houle, M. Hubbell, Anna Klein, P. Michaleas, Lauren Milechin, J. Mullen, Antonio Rosa, Charles Yee, A. Reuther, J. Kepner

{"title":"Benchmarking network fabrics for data distributed training of deep neural networks","authors":"S. Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill Arcand, Bill Bergeron, David Bestor, C. Byun, V. Gadepally, Michael Houle, M. Hubbell, Anna Klein, P. Michaleas, Lauren Milechin, J. Mullen, Antonio Rosa, Charles Yee, A. Reuther, J. Kepner","doi":"10.1109/HPEC43674.2020.9286232","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286232","url":null,"abstract":"Artificial Intelligence/Machine Learning applications require the training of complex models on large amounts of labelled data. The large computational requirements for training deep models have necessitated the development of new methods for faster training. One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes. This approach is simple to implement and supported by most of the commonly used machine learning frameworks. The data parallel approach leverages MPI for communicating gradients across all nodes. In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning. We compare the effect of using GPUDirect and NCCL on Ethernet and OmniPath fabrics. Our results show that using Ethernet-based networking in shared HPC systems does not have a significant effect on the training times for commonly used deep neural network architectures or traditional HPC applications such as Computational Fluid Dynamics.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114779405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Best of Both Worlds: High Performance Interactive and Batch Launching 两全其美:高性能交互和批量启动

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-08-05 DOI: 10.1109/HPEC43674.2020.9286142

C. Byun, J. Kepner, W. Arcand, David Bestor, Bill Bergeron, V. Gadepally, Michael Houle, M. Hubbell, Michael Jones, Andrew Kirby, Anna Klein, P. Michaleas, Lauren Milechin, J. Mullen, Andrew Prout, Antonio Rosa, S. Samsi, Charles Yee, A. Reuther

{"title":"Best of Both Worlds: High Performance Interactive and Batch Launching","authors":"C. Byun, J. Kepner, W. Arcand, David Bestor, Bill Bergeron, V. Gadepally, Michael Houle, M. Hubbell, Michael Jones, Andrew Kirby, Anna Klein, P. Michaleas, Lauren Milechin, J. Mullen, Andrew Prout, Antonio Rosa, S. Samsi, Charles Yee, A. Reuther","doi":"10.1109/HPEC43674.2020.9286142","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286142","url":null,"abstract":"Rapid launch of thousands of jobs is essential for effective interactive supercomputing, big data analysis, and AI algorithm development. Achieving thousands of launches per second has required hardware to be available to receive these jobs. This paper presents a novel preemptive approach to implement “spot” jobs on MIT SuperCloud systems allowing the resources to be fully utilized for both long running batch jobs while still providing fast launch for interactive jobs. The new approach separates the job preemption and scheduling operations and can achieve 100 times faster performance in the scheduling of a job with preemption when compared to using the standard scheduler-provided automatic preemption-based capability. The results demonstrate that the new approach can schedule interactive jobs preemptively at a performance comparable to when the required computing resources are idle and available. The spot job capability can be deployed without disrupting the interactive user experience while increasing the overall system utilization.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128201540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Distributed Non-Negative Tensor Train Decomposition 分布非负张量列分解

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-08-04 DOI: 10.1109/HPEC43674.2020.9286234

Manish Bhattarai, Gopinath Chennupati, E. Skau, Raviteja Vangara, Hirsto Djidjev, B. Alexandrov

{"title":"Distributed Non-Negative Tensor Train Decomposition","authors":"Manish Bhattarai, Gopinath Chennupati, E. Skau, Raviteja Vangara, Hirsto Djidjev, B. Alexandrov","doi":"10.1109/HPEC43674.2020.9286234","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286234","url":null,"abstract":"The era of exascale computing opens new venues for innovations and discoveries in many scientific, engineering, and commercial fields. However, with the exaflops also come the extra-large high-dimensional data generated by highperformance computing. High-dimensional data is presented as multidimensional arrays, aka tensors. The presence of latent (not directly observable) structures in the tensor allows a unique representation and compression of the data by classical tensor factorization techniques. However, the classical tensor methods are not always stable or they can be exponential in their memory requirements, which makes them not suitable for high-dimensional tensors. Tensor train (TT) is a state-of-the-art tensor network introduced for factorization of high-dimensional tensors. TT transforms the initial high-dimensional tensor in a network of three-dimensional tensors that requires only a linear storage. Many real-world data, such as, density, temperature, population, probability, etc., are non-negative and for an easy interpretation, the algorithms preserving non-negativity are preferred. Here, we introduce a distributed non-negative tensor-train and demonstrate its scalability and the compression on synthetic and realworld big datasets.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128888563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11