面向异构多核的OSCAR并行化和功耗降低编译器与API:(特邀论文)

2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC) Pub Date : 2021-11-01 DOI:10.1109/PEHC54839.2021.00007

H. Kasahara, K. Kimura, T. Kitamura, Hiroki Mikami, Kazutaka Morita, Kazuki Fujita, Kazuki Yamamoto, Tohma Kawasumi

{"title":"面向异构多核的OSCAR并行化和功耗降低编译器与API:(特邀论文)","authors":"H. Kasahara, K. Kimura, T. Kitamura, Hiroki Mikami, Kazutaka Morita, Kazuki Fujita, Kazuki Yamamoto, Tohma Kawasumi","doi":"10.1109/PEHC54839.2021.00007","DOIUrl":null,"url":null,"abstract":"Heterogeneous computing systems, connecting general-purpose processor cores with accelerators and/or different kinds of general-purpose processor cores, have been widely used for HPC, cloud servers, self-driving vehicles, AI robots, and so on. They are used to obtain high performance and/or low power consumption. This paper introduces the OSCAR (Optimally Scheduled Advanced Multiprocessor) parallelizing compiler and OSCAR API. They allow users to automatically parallelize and power-reduce a C or Fortran program for various heterogeneous computing systems. OSCAR compiler has been developed since 1983, aiming at co-design of multiprocessor architecture and compiler. Currently, it can generate parallel machine codes for any shared memory homogeneous and heterogeneous multicores with or without hardware cache-coherent mechanism if a sequential C or Fortran compiler exists for the target multicore. OSCAR compiler translates a sequential user program written in C or Fortran into a parallelized C or Fortran program with OSCAR API compatible with frequency-voltage control, clock-gating, and power gating directives for each core, memory module, and interconnect defined in OSCAR API. The generated parallel program consists of threads specified by OpenMP \"section\" directives. The threads can be compiled into machine codes by an OpenMP compiler or a sequential C or Fortran compiler for a target general-purpose processor cores or accelerator cores. The compilation flow and execution and power-reduce performance for scientific and embedded applications and Deep Learning are shown on several heterogeneous systems, such as a heterogeneous multicore processor having eight general-purpose cores and 4 DRPs, or Dynamically Reconfigurable Processors, a heterogeneous multicore on FPGA using NIOS cores, and a new vector accelerator based on the past Japanese supercomputers and a personal vector supercomputer NEC Aurora Tsubasa.","PeriodicalId":147071,"journal":{"name":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"OSCAR Parallelizing and Power Reducing Compiler and API for Heterogeneous Multicores : (Invited Paper)\",\"authors\":\"H. Kasahara, K. Kimura, T. Kitamura, Hiroki Mikami, Kazutaka Morita, Kazuki Fujita, Kazuki Yamamoto, Tohma Kawasumi\",\"doi\":\"10.1109/PEHC54839.2021.00007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous computing systems, connecting general-purpose processor cores with accelerators and/or different kinds of general-purpose processor cores, have been widely used for HPC, cloud servers, self-driving vehicles, AI robots, and so on. They are used to obtain high performance and/or low power consumption. This paper introduces the OSCAR (Optimally Scheduled Advanced Multiprocessor) parallelizing compiler and OSCAR API. They allow users to automatically parallelize and power-reduce a C or Fortran program for various heterogeneous computing systems. OSCAR compiler has been developed since 1983, aiming at co-design of multiprocessor architecture and compiler. Currently, it can generate parallel machine codes for any shared memory homogeneous and heterogeneous multicores with or without hardware cache-coherent mechanism if a sequential C or Fortran compiler exists for the target multicore. OSCAR compiler translates a sequential user program written in C or Fortran into a parallelized C or Fortran program with OSCAR API compatible with frequency-voltage control, clock-gating, and power gating directives for each core, memory module, and interconnect defined in OSCAR API. The generated parallel program consists of threads specified by OpenMP \\\"section\\\" directives. The threads can be compiled into machine codes by an OpenMP compiler or a sequential C or Fortran compiler for a target general-purpose processor cores or accelerator cores. The compilation flow and execution and power-reduce performance for scientific and embedded applications and Deep Learning are shown on several heterogeneous systems, such as a heterogeneous multicore processor having eight general-purpose cores and 4 DRPs, or Dynamically Reconfigurable Processors, a heterogeneous multicore on FPGA using NIOS cores, and a new vector accelerator based on the past Japanese supercomputers and a personal vector supercomputer NEC Aurora Tsubasa.\",\"PeriodicalId\":147071,\"journal\":{\"name\":\"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PEHC54839.2021.00007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PEHC54839.2021.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

异构计算系统将通用处理器核心与加速器和/或不同类型的通用处理器核心连接起来，已广泛应用于高性能计算、云服务器、自动驾驶汽车、人工智能机器人等领域。它们用于获得高性能和/或低功耗。介绍了优化调度高级多处理器(OSCAR)并行编译器和OSCAR API。它们允许用户为各种异构计算系统自动并行化和降低C或Fortran程序的功耗。OSCAR编译器自1983年开发以来，旨在为多处理器体系结构和编译器的协同设计。目前，如果目标多核存在顺序C或Fortran编译器，它可以为任何共享内存同质和异构多核生成并行机器码，无论是否有硬件缓存一致机制。OSCAR编译器将用C或Fortran编写的顺序用户程序翻译成并行的C或Fortran程序，OSCAR API与OSCAR API中定义的每个核心，内存模块和互连的频率电压控制，时钟门控和功率门控指令兼容。生成的并行程序由OpenMP“section”指令指定的线程组成。这些线程可以通过OpenMP编译器或用于目标通用处理器内核或加速器内核的顺序C或Fortran编译器编译成机器码。在几个异构系统上展示了科学和嵌入式应用程序和深度学习的编译流程、执行和功耗降低性能，例如具有8个通用内核和4个drp的异构多核处理器，或动态可重构处理器，使用NIOS内核的FPGA上的异构多核，以及基于过去日本超级计算机和个人矢量超级计算机NEC Aurora Tsubasa的新型矢量加速器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

OSCAR Parallelizing and Power Reducing Compiler and API for Heterogeneous Multicores : (Invited Paper)

Heterogeneous computing systems, connecting general-purpose processor cores with accelerators and/or different kinds of general-purpose processor cores, have been widely used for HPC, cloud servers, self-driving vehicles, AI robots, and so on. They are used to obtain high performance and/or low power consumption. This paper introduces the OSCAR (Optimally Scheduled Advanced Multiprocessor) parallelizing compiler and OSCAR API. They allow users to automatically parallelize and power-reduce a C or Fortran program for various heterogeneous computing systems. OSCAR compiler has been developed since 1983, aiming at co-design of multiprocessor architecture and compiler. Currently, it can generate parallel machine codes for any shared memory homogeneous and heterogeneous multicores with or without hardware cache-coherent mechanism if a sequential C or Fortran compiler exists for the target multicore. OSCAR compiler translates a sequential user program written in C or Fortran into a parallelized C or Fortran program with OSCAR API compatible with frequency-voltage control, clock-gating, and power gating directives for each core, memory module, and interconnect defined in OSCAR API. The generated parallel program consists of threads specified by OpenMP "section" directives. The threads can be compiled into machine codes by an OpenMP compiler or a sequential C or Fortran compiler for a target general-purpose processor cores or accelerator cores. The compilation flow and execution and power-reduce performance for scientific and embedded applications and Deep Learning are shown on several heterogeneous systems, such as a heterogeneous multicore processor having eight general-purpose cores and 4 DRPs, or Dynamically Reconfigurable Processors, a heterogeneous multicore on FPGA using NIOS cores, and a new vector accelerator based on the past Japanese supercomputers and a personal vector supercomputer NEC Aurora Tsubasa.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)

自引率

0.00%

发文量