2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)最新文献

[Copyright notice] (版权)

2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC) Pub Date : 2021-11-01 DOI: 10.1109/pehc54839.2021.00002

引用次数: 0

A Python-based High-Level Programming Flow for CPU-FPGA Heterogeneous Systems : (Invited Paper) 基于python的CPU-FPGA异构系统高级编程流程(特邀论文)

2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC) Pub Date : 2021-11-01 DOI: 10.1109/PEHC54839.2021.00008

Sitao Huang, Kun Wu, S. R. Chalamalasetti, Izzat El Hajj, Cong Xu, P. Faraboschi, Deming Chen

引用次数: 0

GenMAT: A General-Purpose Machine Learning-Driven Auto-Tuner for Heterogeneous Platforms GenMAT:用于异构平台的通用机器学习驱动的自动调谐器

2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC) Pub Date : 2021-11-01 DOI: 10.1109/PEHC54839.2021.00006

Naifeng Zhang, Ajitesh Srivastava, R. Kannan, V. Prasanna

{"title":"GenMAT: A General-Purpose Machine Learning-Driven Auto-Tuner for Heterogeneous Platforms","authors":"Naifeng Zhang, Ajitesh Srivastava, R. Kannan, V. Prasanna","doi":"10.1109/PEHC54839.2021.00006","DOIUrl":"https://doi.org/10.1109/PEHC54839.2021.00006","url":null,"abstract":"As computing platforms evolve with heterogeneous resources, developing optimized code that fully exploits the computing power becomes increasingly challenging. Domain experts need extensive knowledge of computer architecture, compiler optimizations, and parallel computing to understand which implementation will work best for their problem domain and data. Even with considerable time learning, writing, and debugging high-performance code, such optimizations may not generalize to different inputs, applications, or computing platforms. To assist the end-users in optimally deploying workloads on the heterogeneous environment with high productivity, a fundamental problem is to automatically find the best \"variant\" of an application—the implementation with the optimal configurations on the most suitable hardware resource resulting in the minimum runtime. We propose GenMAT, a portable tool for identifying the best variant of any application specified as a meta-program with exposed tunable parameters on any hardware. GenMAT automatically profiles the application by varying the exposed tunable parameters to generate a small set of profiling data. Then, GenMAT trains a compact machine learning model that is used to quickly predict the runtimes of a large number of possible parameter settings to identify the best variant. We show that the variant selected by GenMAT has a runtime deviation within 3.5% of the true best variant in determining the best linear algebra library for matrix operations. For identifying the best Halide schedule, GenMAT correctly ranks the runtimes of thousands of candidates with an average Spearman’s rank correlation coefficient of 0.95.","PeriodicalId":147071,"journal":{"name":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114776916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Designing Heterogeneous Systems: Large Scale Architectural Exploration Via Simulation : Invited Paper 设计异构系统:通过模拟的大规模建筑探索:特邀论文

2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC) Pub Date : 2021-11-01 DOI: 10.1109/PEHC54839.2021.00011

Darel N. Emmot, Ryan Menhusen, D. Dauwe, Vipin Kumar Kukkala, Kirk M. Bresniker

{"title":"Designing Heterogeneous Systems: Large Scale Architectural Exploration Via Simulation : Invited Paper","authors":"Darel N. Emmot, Ryan Menhusen, D. Dauwe, Vipin Kumar Kukkala, Kirk M. Bresniker","doi":"10.1109/PEHC54839.2021.00011","DOIUrl":"https://doi.org/10.1109/PEHC54839.2021.00011","url":null,"abstract":"The end of Dennard’s scaling in 2005 and the emerging end of Moore’s Law has resulted in a number of heterogeneous design wins, applied to compute (vector processing (GPUs), vector-matrix multiplication, FPGAs, etc.), memory (High Bandwidth Memory (HBM), Fabric Attached Memory (FAM), memory-driven designs, etc.) and interconnects (CXL, Gen-Z, etc.). Designing these heterogeneous systems is becoming increasingly hard due to a plethora of architectural choices. Whole meta-level programming environments are required for designing and architecting heterogeneity of both systems and the applications running on those systems. Hewlett Packard Enterprise™ (HPE) has found Sandia’s Structural Simulation Toolkit (SST) to be a powerful aid to architectural exploration and validation of applications optimized for use with Fabric Attached Memory (FAM) with near memory compute abilities. Standard SST components have been augmented with plug-ins modeling Cray Slingshot™ Network Interface Controller (NIC) and router elements with drivers for OpenSHMEM and OpenFAM APIs. We anticipate future initiatives calling for dramatic improvement across broader HPC application areas to require refined processes in the collaborative invention of new heterogeneous designs. In this article, we present our process of using white-box characterization of applications at a node level to create abstract models and discuss the methodologies that are used to reliably extend simulations to scales of 10’s of thousands of nodes to estimate large scale throughput. Our application and API simulation methodology ensures high communication resource utilization with robust, straightforward interfaces conducive to collaborative heterogeneous accelerator integration. Application and system developers are thus enabled to exploit heterogeneity to support higher system throughput.","PeriodicalId":147071,"journal":{"name":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129875074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

OSCAR Parallelizing and Power Reducing Compiler and API for Heterogeneous Multicores : (Invited Paper) 面向异构多核的OSCAR并行化和功耗降低编译器与API:(特邀论文)

2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC) Pub Date : 2021-11-01 DOI: 10.1109/PEHC54839.2021.00007

H. Kasahara, K. Kimura, T. Kitamura, Hiroki Mikami, Kazutaka Morita, Kazuki Fujita, Kazuki Yamamoto, Tohma Kawasumi

{"title":"OSCAR Parallelizing and Power Reducing Compiler and API for Heterogeneous Multicores : (Invited Paper)","authors":"H. Kasahara, K. Kimura, T. Kitamura, Hiroki Mikami, Kazutaka Morita, Kazuki Fujita, Kazuki Yamamoto, Tohma Kawasumi","doi":"10.1109/PEHC54839.2021.00007","DOIUrl":"https://doi.org/10.1109/PEHC54839.2021.00007","url":null,"abstract":"Heterogeneous computing systems, connecting general-purpose processor cores with accelerators and/or different kinds of general-purpose processor cores, have been widely used for HPC, cloud servers, self-driving vehicles, AI robots, and so on. They are used to obtain high performance and/or low power consumption. This paper introduces the OSCAR (Optimally Scheduled Advanced Multiprocessor) parallelizing compiler and OSCAR API. They allow users to automatically parallelize and power-reduce a C or Fortran program for various heterogeneous computing systems. OSCAR compiler has been developed since 1983, aiming at co-design of multiprocessor architecture and compiler. Currently, it can generate parallel machine codes for any shared memory homogeneous and heterogeneous multicores with or without hardware cache-coherent mechanism if a sequential C or Fortran compiler exists for the target multicore. OSCAR compiler translates a sequential user program written in C or Fortran into a parallelized C or Fortran program with OSCAR API compatible with frequency-voltage control, clock-gating, and power gating directives for each core, memory module, and interconnect defined in OSCAR API. The generated parallel program consists of threads specified by OpenMP \"section\" directives. The threads can be compiled into machine codes by an OpenMP compiler or a sequential C or Fortran compiler for a target general-purpose processor cores or accelerator cores. The compilation flow and execution and power-reduce performance for scientific and embedded applications and Deep Learning are shown on several heterogeneous systems, such as a heterogeneous multicore processor having eight general-purpose cores and 4 DRPs, or Dynamically Reconfigurable Processors, a heterogeneous multicore on FPGA using NIOS cores, and a new vector accelerator based on the past Japanese supercomputers and a personal vector supercomputer NEC Aurora Tsubasa.","PeriodicalId":147071,"journal":{"name":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133194398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Holistic Systems Approach to Leveraging Heterogeneity 利用异质性的整体系统方法

2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC) Pub Date : 2021-11-01 DOI: 10.1109/PEHC54839.2021.00009

R. Wisniewski, Xinmin Tian, P. Thierry, Samantika Sury, S. Pennycook

{"title":"A Holistic Systems Approach to Leveraging Heterogeneity","authors":"R. Wisniewski, Xinmin Tian, P. Thierry, Samantika Sury, S. Pennycook","doi":"10.1109/PEHC54839.2021.00009","DOIUrl":"https://doi.org/10.1109/PEHC54839.2021.00009","url":null,"abstract":"Increasingly, HPC developers are turning to heterogeneity to continue to achieve the performance they desire. Leveraging heterogeneity however is challenging. While an increasing number of applications are starting to gain advantage from heterogeneity, there remains much work before it sees widespread productive use.We believe that a holistic systems approach encompassing both hardware and software is the best path towards productively leveraging heterogeneity. In this paper, we describe the key attributes of a successful hardware approach from a node and system perspective. At the node level, it is important to have components that are pluggable and easily combined in a tightly-coupled manner. From the systems perspective, it is important to ensure resources can be composed in a manner that minimizes unused (\"stranded\") resources.We describe the importance of a complementary software approach that provides a single development environment, high-lighting the value of software being able to handle heterogeneity from node through system. We detail the vision of oneAPI that addresses these challenges, the resultant programming model, and the advantages for applications.","PeriodicalId":147071,"journal":{"name":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122032456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Survival of the Fittest Amidst the Cambrian Explosion of Processor Architectures for Artificial Intelligence : Invited Paper 人工智能处理器架构寒武纪大爆发中的适者生存:特邀论文

2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC) Pub Date : 2021-11-01 DOI: 10.1109/PEHC54839.2021.00010

S. Sukumar, J. Balma, Cong Xu, S. Serebryakov

{"title":"Survival of the Fittest Amidst the Cambrian Explosion of Processor Architectures for Artificial Intelligence : Invited Paper","authors":"S. Sukumar, J. Balma, Cong Xu, S. Serebryakov","doi":"10.1109/PEHC54839.2021.00010","DOIUrl":"https://doi.org/10.1109/PEHC54839.2021.00010","url":null,"abstract":"The need for high performance computing in data-driven artificial intelligence (AI) workloads has led to the Cambrian explosion of processor architectures. As these novel processor architectures aim to evolve and thrive inside datacenters and cloud-services, we need to understand different figures-of-merit for device-, server- and rack-scale systems. Towards that goal, we share early-access hands-on experience with these processor/accelerator architectures. We describe an evaluation plan that includes carefully chosen neural network models to gauge the maturity of the hardware and software ecosystem. Our hands-on evaluation using benchmarks reveals significant benefits of hardware acceleration while exposing several blind spots in the software ecosystem. Ranking the benefits based on different figures of merit such as cost, energy, and adoption efficiency reveals a \"heterogenous\" future for production systems with multiple processor architectures in the edge-to-datacenter AI workflow.Preparing to survive in this heterogeneous future, we describe a method to profile and predict the performance benefits of a deep learning training workload on novel architectures. Our approach profiles the neural network model for memory, bandwidth and compute requirements by analyzing the model definition. Then, using profiling tools, we estimate the I/O and arithmetic intensity requirements at different batch sizes. By overlaying profiler results onto analytic roofline models of the emerging processor architectures, we identify opportunities for potential acceleration. We discuss how the interpretation of the roofline analysis can guide system architecture to deliver productive performance and conclude with recommendations to survive the Cambrian explosion.","PeriodicalId":147071,"journal":{"name":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121528867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1