2022 IEEE High Performance Extreme Computing Conference (HPEC)最新文献_第2页

Enabling Transformers to Understand Low-Level Programs 使变压器能够理解低级程序

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926313

Z. Guo, William S. Moses

{"title":"Enabling Transformers to Understand Low-Level Programs","authors":"Z. Guo, William S. Moses","doi":"10.1109/HPEC55821.2022.9926313","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926313","url":null,"abstract":"Unlike prior approaches to machine learning, Transformer models can first be trained on a large corpus of unlabeled data with a generic objective and then on a smaller task-specific dataset. This versatility has led to both larger models and datasets. Consequently, Transformers have led to breakthroughs in the field of natural language processing. Generic program optimization presently operates on low-level programs such as LLVM. Unlike the high-level languages (e.g. C, Python, Java), which have seen initial success in machine-learning analyses, lower-level languages tend to be more verbose and repetitive to precisely specify program behavior, provide more details about microarchitecture, and derive properties necessary for optimization, all of which makes it difficult for machine learning. In this work, we apply transfer learning to low-level (LLVM) programs and study how low-level programs can be made more amenable to Transformer models through various techniques, including preprocessing, infix/prefix operators, and information deduplication. We evaluate the effectiveness of these techniques through a series of ablation studies on the task of translating C to both unoptimized (-O0) and optimized (-01) LLVM IR. On the AnghaBench dataset, our model achieves a 49.57% verbatim match and BLEU score of 87.68 against Clang -O0 and 38.73% verbatim match and BLEU score of 77.03 against Clang -O1.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127382338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Site-Wide HPC Data Center Demand Response 全站HPC数据中心需求响应

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926322

D. Wilson, I. Paschalidis, A. Coskun

{"title":"Site-Wide HPC Data Center Demand Response","authors":"D. Wilson, I. Paschalidis, A. Coskun","doi":"10.1109/HPEC55821.2022.9926322","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926322","url":null,"abstract":"As many electricity markets are trending towards greater renewable energy generation, there will be an increased need for electrical grids to cooperatively balance electricity supply and demand. Data centers are one large consumer of electricity on a global scale, and they are well-suited to act as a grid load stabilizer via performing “demand response.” Prior investigations in this space have demonstrated how data centers can continue to meet their users' quality of service (QoS) needs by modeling relationships between cluster job queues, server power properties, and application performance. While server power is a major factor in data center power consumption, other components such as cooling systems contribute a non-nealiaible amount of electricity demand. This work proposes using a simple site-wide (i.e., including all components of the data center) power model on top of QoS-aware demand response solutions to achieve the QoS benefits of those solutions while improving the cost-saving opportunities in demand response. We demonstrate 1.3x cost savings compared to QoS-aware demand response policies that do not utilize site-wide power models, and show similar savings in cases of severely under-predicted site-wide power consumption if 1.5x relaxed QoS constraints are allowed.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130751615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RaiderSTREAM: Adapting the STREAM Benchmark to Modern HPC Systems RaiderSTREAM:使STREAM基准适应现代HPC系统

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926292

Michael Beebe, Brody Williams, Stephen Devaney, John D. Leidel, Yong Chen, Stephen Poole

{"title":"RaiderSTREAM: Adapting the STREAM Benchmark to Modern HPC Systems","authors":"Michael Beebe, Brody Williams, Stephen Devaney, John D. Leidel, Yong Chen, Stephen Poole","doi":"10.1109/HPEC55821.2022.9926292","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926292","url":null,"abstract":"Sustaining high memory bandwidth utilization is a common bottleneck to maximizing the performance of scien-tific applications, with the dominating factor of the runtime being the speed at which data can be loaded from memory into the CPU and results can be written back to memory, particularly for increasingly critical data-intensive workloads. The prevalence of irregular memory access patterns within these applications, exemplified by kernels such as those found in sparse matrix and graph applications, significantly degrade the achievable performance of a system's memory hierarchy. As such, it is highly desirable to be able to accurately measure a given memory hierarchy's sustainable memory bandwidth when designing applications as well as future high-performance computing (HPC) systems. STREAM is a de facto standard benchmark for measuring sustained memory bandwidth and has garnered widespread adoption. In this work, we discuss current limitations of the STREAM benchmark in the context of high-performance and scientific computing. We then introduce a new version of STREAM, called RaiderSTREAM, built on the OpenSHMEM and MPI programming models in tandem with OpenMP, that include additional kernels which better model irregular memory access patterns in order to address these shortcomings.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"73 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121132069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Hardware Accelerated Garbage Collection with Near-Memory Processing 基于近内存处理的硬件加速垃圾回收研究

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926323

Samuel Thomas, Jiwon Choe, Ofir Gordon, E. Petrank, T. Moreshet, M. Herlihy, R. I. Bahar

引用次数: 0

HTC: Hybrid vertex-parallel and edge-parallel Triangle Counting 混合顶点平行和边平行三角形计数

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926383

Li Zeng, Kang Yang, Haoran Cai, Jinhua Zhou, Rongqian Zhao, Xin Chen

引用次数: 3

Towards Fast GPU-based Sparse DNN Inference: A Hybrid Compute Model 基于gpu的稀疏DNN快速推理:一种混合计算模型

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926290

Shaoxian Xu, Minkang Wu, Long Zheng, Zhiyuan Shao, Xiangyu Ye, Xiaofei Liao, Hai Jin

{"title":"Towards Fast GPU-based Sparse DNN Inference: A Hybrid Compute Model","authors":"Shaoxian Xu, Minkang Wu, Long Zheng, Zhiyuan Shao, Xiangyu Ye, Xiaofei Liao, Hai Jin","doi":"10.1109/HPEC55821.2022.9926290","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926290","url":null,"abstract":"As the model scale of Deep Neural Networks (DNNs) increases, the memory and computational cost of DNNs become overwhelmingly large. Sparse Deep Neural Networks (SpDNNs) are promising to cope with this challenge by using fewer weights while preserving the accuracy. However, the sparsity nature of SpDNN models makes it difficult to run efficiently on GPUs. To stimulate technical advances for improving the efficiency of SpDNN inference, the MIT/IEEE/Amazon GraphChallenge proposes the SpDNN Challenge in 2019. In this paper, we present a hybrid compute model to improve the efficiency of Sparse Matrix Multiplications (SpMMs), the core computation of SpDNN inference. First, the given sparse weight matrix will be divided to generate many (sparse and dense) submatrices. For sparse submatrices, we leverage compile-time data embedding to compile the sparse data together with their corresponding computations into instructions and hence the number of random accesses can be reduced significantly. For dense submatrices, we follow the traditional computing mode where the data is obtained from the memory to exploit the high memory bandwidth of GPU. This hybrid compute model effectively balances the memory and instruction bottlenecks, and offers more scheduling opportunities to overlap computing operations and memory accesses on GPU. To determine whether a sub matrix is sparse, we present a cost model to estimate its time cost under the traditional computing mode and the data-embedded computing mode in an accurate and efficient manner. Once the computing mode for all submatrices is determined, customized codes will be generated for the SpDNN inference. Experimental results on the SpDNN Challenge benchmarks show that our approach achieves up to 197.86 tera-edges per second inference throughput on a single NVIDIA A100 GPU. Compared to the 2021 and 2020 champions, our approach offers up to 6.37x and 89.94x speedups on a single GPU, respectively. We also implement a 16-GPU version, showing up to 9.49x and 80.11x speedups over the former 16-GPU baselines of the 2021 and 2020 champions.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126281743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Processing Particle Data Flows with SmartNICs 使用smartnic处理粒子数据流

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926325

Jianshen Liu, C. Maltzahn, M. Curry, C. Ulmer

引用次数: 2

Towards Fast Crash-Consistent Cluster Checkpointing 走向快速崩溃一致的集群检查点

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926330

Andrew Wood, Moshik Hershcovitch, Ilias Ennmouri, Weiyu Zong, Saurav Chennuri, S. Cohen, S. Sundararaman, Daniel Waddington, Peter Chin

{"title":"Towards Fast Crash-Consistent Cluster Checkpointing","authors":"Andrew Wood, Moshik Hershcovitch, Ilias Ennmouri, Weiyu Zong, Saurav Chennuri, S. Cohen, S. Sundararaman, Daniel Waddington, Peter Chin","doi":"10.1109/HPEC55821.2022.9926330","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926330","url":null,"abstract":"Machine Learning models are expensive to train: they require expensive high-compute hardware and have long training times. Therefore, models are extra sensitive to program faults or unexpected system crashes, which can erase hours if not days worth of work. While there are plenty of strategies designed to mitigate the risk of unexpected system downtime, the most popular strategy in machine learning is called checkpointing: periodically saving the state of the model to persistent storage. Checkpointing is an effective strategy, however, it requires carefully balancing two operations: how often a checkpoint is made (the checkpointing schedule), and the cost of creating a checkpoint itself. In this paper, we leverage Python Memory Manager (PyMM), which provides Python support for Persistent Memory and emerging Persistent Memory technology (Optane DC) to accelerate the checkpointing operation while maintaining crash consistency. We first show that when checkpointing models, PyMM with persistent memory can save from minutes to days of checkpointing runtime. We then further optimize the checkpointing operation with PyMM and demonstrate our approach with the KMeans and Gaussian Mixture Model algorithms on two real-world datasets, MNIST and MusicNet. Through evaluation, we show that these two algorithms achieve a checkpointing speedup of a factor between 10 and 75x for KMeans and over 3x for GMM against the current state-of-the-art checkpointing approaches. We also verify that our solution recovers from crashes, while traditional approaches cannot.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121144400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Trends in Energy Estimates for Computing in AI/Machine Learning Accelerators, Supercomputers, and Compute-Intensive Applications

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.9926296

S. Shankar, A. Reuther

{"title":"Trends in Energy Estimates for Computing in AI/Machine Learning Accelerators, Supercomputers, and Compute-Intensive Applications","authors":"S. Shankar, A. Reuther","doi":"10.1109/HPEC55821.2022.9926296","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.9926296","url":null,"abstract":"We examine the computational energy requirements of different systems driven by the geometrical scaling law (known as Moore's law or Dennard Scaling for geometry) and increasing use of Artificial Intelligence/ Machine Learning (AI/ML) over the last decade. With more scientific and technology applications based on data-driven discovery, machine learning methods, especially deep neural networks, have become widely used. In order to enable such applications, both hardware accelerators and advanced AI/ML methods have led to the introduction of new architectures, system designs, algorithms, and software. Our analysis of energy trends indicates three important observations: 1) Energy efficiency due to geometrical scaling is slowing down; 2) The energy efficiency at the bit-level does not translate into efficiency at the instruction-level, or at the system-level for a variety of systems, especially for large-scale AI/ML accelerators or supercomputers; 3) At the application level, general-purpose AI/ML methods can be computationally energy intensive, off-setting the gains in energy from geometrical scaling and special purpose accelerators. Further, our analysis provides specific pointers for integrating energy efficiency with performance analysis for enabling high-performance and sustainable computing in the future.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125718243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

SuperCloud Lite in the Cloud - lightweight, secure, self-service, on-demand mechanisms for creating customizable research computing environments 云中的SuperCloud Lite——轻量级、安全、自助、按需机制，用于创建可定制的研究计算环境

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI: 10.1109/HPEC55821.2022.10089529

Kelsie Edie, K. Keville, Lauren Milechin, Chris Hill

{"title":"SuperCloud Lite in the Cloud - lightweight, secure, self-service, on-demand mechanisms for creating customizable research computing environments","authors":"Kelsie Edie, K. Keville, Lauren Milechin, Chris Hill","doi":"10.1109/HPEC55821.2022.10089529","DOIUrl":"https://doi.org/10.1109/HPEC55821.2022.10089529","url":null,"abstract":"We describe and examine an automation for deploying on-demand, OAuth2 secured virtual machine instances. Our approach does not require any expert security and web service knowledge to create a secure instance. The approach allows non-experts to launch web-accessible virtual machine services that are automatically secured through OAuth2 authentication, an authentication standard widely employed in academic and enterprise environments. We demonstrate the approach through an example of creating secure commercial cloud instances of the MIT SuperCloud modern research computing oriented software stack. A small example of a use case is examined and compared with native MIT SuperCloud experience as a preliminary evaluation. The example illustrates several useful features. It retains OAuth2 security guarantees and leverages a simple OAuth2 proxy architecture that in turn employs simple DNS based service limits to manage access to the proxy service. The system has the potential to provide a default secure environment in which access is, in theory, limited to a narrow trust circle. It leverages WebSockets to provide a pure browser enabled, zero install base service. For the user, it is entirely self-service so that a non-expert, non-privileged user can launch instances, while supporting access to a familiar environment on a broad selection of hardware, including high-end GPUs and isolated bare-metal resources. The environment includes pre-configured browser based desktop GUI and notebook configurations. It can provide the option of end-user privileged access to the VM for flexible customization. It integrates with a simplified cost-monitoring and machine management framework that provides visibility to commercial cloud charges and some budget guard rails, and supports instance stop, restart, and pausing features to allow intermittent use and cost reduction.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133886132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0