H. Kasahara, K. Kimura, T. Kitamura, Hiroki Mikami, Kazutaka Morita, Kazuki Fujita, Kazuki Yamamoto, Tohma Kawasumi
{"title":"面向异构多核的OSCAR并行化和功耗降低编译器与API:(特邀论文)","authors":"H. Kasahara, K. Kimura, T. Kitamura, Hiroki Mikami, Kazutaka Morita, Kazuki Fujita, Kazuki Yamamoto, Tohma Kawasumi","doi":"10.1109/PEHC54839.2021.00007","DOIUrl":null,"url":null,"abstract":"Heterogeneous computing systems, connecting general-purpose processor cores with accelerators and/or different kinds of general-purpose processor cores, have been widely used for HPC, cloud servers, self-driving vehicles, AI robots, and so on. They are used to obtain high performance and/or low power consumption. This paper introduces the OSCAR (Optimally Scheduled Advanced Multiprocessor) parallelizing compiler and OSCAR API. They allow users to automatically parallelize and power-reduce a C or Fortran program for various heterogeneous computing systems. OSCAR compiler has been developed since 1983, aiming at co-design of multiprocessor architecture and compiler. Currently, it can generate parallel machine codes for any shared memory homogeneous and heterogeneous multicores with or without hardware cache-coherent mechanism if a sequential C or Fortran compiler exists for the target multicore. OSCAR compiler translates a sequential user program written in C or Fortran into a parallelized C or Fortran program with OSCAR API compatible with frequency-voltage control, clock-gating, and power gating directives for each core, memory module, and interconnect defined in OSCAR API. The generated parallel program consists of threads specified by OpenMP \"section\" directives. The threads can be compiled into machine codes by an OpenMP compiler or a sequential C or Fortran compiler for a target general-purpose processor cores or accelerator cores. The compilation flow and execution and power-reduce performance for scientific and embedded applications and Deep Learning are shown on several heterogeneous systems, such as a heterogeneous multicore processor having eight general-purpose cores and 4 DRPs, or Dynamically Reconfigurable Processors, a heterogeneous multicore on FPGA using NIOS cores, and a new vector accelerator based on the past Japanese supercomputers and a personal vector supercomputer NEC Aurora Tsubasa.","PeriodicalId":147071,"journal":{"name":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"OSCAR Parallelizing and Power Reducing Compiler and API for Heterogeneous Multicores : (Invited Paper)\",\"authors\":\"H. Kasahara, K. Kimura, T. Kitamura, Hiroki Mikami, Kazutaka Morita, Kazuki Fujita, Kazuki Yamamoto, Tohma Kawasumi\",\"doi\":\"10.1109/PEHC54839.2021.00007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous computing systems, connecting general-purpose processor cores with accelerators and/or different kinds of general-purpose processor cores, have been widely used for HPC, cloud servers, self-driving vehicles, AI robots, and so on. They are used to obtain high performance and/or low power consumption. This paper introduces the OSCAR (Optimally Scheduled Advanced Multiprocessor) parallelizing compiler and OSCAR API. They allow users to automatically parallelize and power-reduce a C or Fortran program for various heterogeneous computing systems. OSCAR compiler has been developed since 1983, aiming at co-design of multiprocessor architecture and compiler. Currently, it can generate parallel machine codes for any shared memory homogeneous and heterogeneous multicores with or without hardware cache-coherent mechanism if a sequential C or Fortran compiler exists for the target multicore. OSCAR compiler translates a sequential user program written in C or Fortran into a parallelized C or Fortran program with OSCAR API compatible with frequency-voltage control, clock-gating, and power gating directives for each core, memory module, and interconnect defined in OSCAR API. The generated parallel program consists of threads specified by OpenMP \\\"section\\\" directives. The threads can be compiled into machine codes by an OpenMP compiler or a sequential C or Fortran compiler for a target general-purpose processor cores or accelerator cores. The compilation flow and execution and power-reduce performance for scientific and embedded applications and Deep Learning are shown on several heterogeneous systems, such as a heterogeneous multicore processor having eight general-purpose cores and 4 DRPs, or Dynamically Reconfigurable Processors, a heterogeneous multicore on FPGA using NIOS cores, and a new vector accelerator based on the past Japanese supercomputers and a personal vector supercomputer NEC Aurora Tsubasa.\",\"PeriodicalId\":147071,\"journal\":{\"name\":\"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PEHC54839.2021.00007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PEHC54839.2021.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
OSCAR Parallelizing and Power Reducing Compiler and API for Heterogeneous Multicores : (Invited Paper)
Heterogeneous computing systems, connecting general-purpose processor cores with accelerators and/or different kinds of general-purpose processor cores, have been widely used for HPC, cloud servers, self-driving vehicles, AI robots, and so on. They are used to obtain high performance and/or low power consumption. This paper introduces the OSCAR (Optimally Scheduled Advanced Multiprocessor) parallelizing compiler and OSCAR API. They allow users to automatically parallelize and power-reduce a C or Fortran program for various heterogeneous computing systems. OSCAR compiler has been developed since 1983, aiming at co-design of multiprocessor architecture and compiler. Currently, it can generate parallel machine codes for any shared memory homogeneous and heterogeneous multicores with or without hardware cache-coherent mechanism if a sequential C or Fortran compiler exists for the target multicore. OSCAR compiler translates a sequential user program written in C or Fortran into a parallelized C or Fortran program with OSCAR API compatible with frequency-voltage control, clock-gating, and power gating directives for each core, memory module, and interconnect defined in OSCAR API. The generated parallel program consists of threads specified by OpenMP "section" directives. The threads can be compiled into machine codes by an OpenMP compiler or a sequential C or Fortran compiler for a target general-purpose processor cores or accelerator cores. The compilation flow and execution and power-reduce performance for scientific and embedded applications and Deep Learning are shown on several heterogeneous systems, such as a heterogeneous multicore processor having eight general-purpose cores and 4 DRPs, or Dynamically Reconfigurable Processors, a heterogeneous multicore on FPGA using NIOS cores, and a new vector accelerator based on the past Japanese supercomputers and a personal vector supercomputer NEC Aurora Tsubasa.