{"title":"[Copyright notice]","authors":"","doi":"10.1109/llvmhpc54804.2021.00002","DOIUrl":"https://doi.org/10.1109/llvmhpc54804.2021.00002","url":null,"abstract":"","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116916750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facilitating CoDesign with Automatic Code Similarity Learning","authors":"T. Nguyen, E. Strohmaier, J. Shalf","doi":"10.1109/llvmhpc54804.2021.00011","DOIUrl":"https://doi.org/10.1109/llvmhpc54804.2021.00011","url":null,"abstract":"Automating the workload characterization process is increasingly important in hardware design. Although compiler tools can automatically collect profiling data and predict performance behaviors, the process has to be repeated for each potential design. Such challenge is exacerbated by the fast growing body of applications and input problems.We propose an alternative approach based on code similarity learning. The application is decomposed into small kernels that can be mapped to known patterns. The behaviors of a pattern on a hardware setup can be reused. To enable this technology, we propose a new code representation and similarity metric. We automate the detection process using compiler and ML methods. Specifically, we reformulate application’s dataflow graphs so that they can be compared based on both compute and data movement. We show this representation can distinguish kernels in the HPCG benchmark and help suggest optimal configurations for SpMV and GEMM hardware accelerators.","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122650788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flacc: Towards OpenACC support for Fortran in the LLVM Ecosystem","authors":"Valentin Clement, J. Vetter","doi":"10.1109/llvmhpc54804.2021.00007","DOIUrl":"https://doi.org/10.1109/llvmhpc54804.2021.00007","url":null,"abstract":"OpenACC is a directive-based programming model for heterogeneous accelerators initially launched in 2010 to provide a portable solution at a level of abstraction above OpenCL, CUDA, and other lower-level programming models. Various implementations of OpenACC for C, C++, and Fortran exist; however, only one open-source, production implementation of OpenACC for Fortran does exist. Moreover, most contemporary compiler tool chains for heterogeneous computing are based on LLVM. This lack of support poses a serious risk for high-performance computing application developers targeting GPUs and other accelerators, and it limits the ability of the community to experiment with, extend, and contribute to the OpenACC specification and open-source implementation itself. To address this gap, we have designed and begun implementing Flacc: an effort funded by the US Exascale Computing Project to develop production OpenACC compiler support for Fortran based on Flang within the LLVM ecosystem. In this paper, we describe the Flacc goals, initial design and prototype, and challenges that we have encountered so far in our prototyping efforts. Flacc is implemented as a MLIR dialect in the Flang Fortran front end in LLVM. The Flacc front end currently supports OpenACC version 3.1, and the Flacc run time is currently under development and relies on contributions from the Clacc project. Current contributions to Flacc are available in the main ${color{Green}{mathbf{LLVM}};{mathbf{repository}}}$.1","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"13 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133774355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, Gokcen Kestor
{"title":"A High Performance Sparse Tensor Algebra Compiler in MLIR","authors":"Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, Gokcen Kestor","doi":"10.1109/llvmhpc54804.2021.00009","DOIUrl":"https://doi.org/10.1109/llvmhpc54804.2021.00009","url":null,"abstract":"Sparse tensor algebra is widely used in many applications, including scientific computing, machine learning, and data analytics. The performance of sparse tensor algebra kernels strongly depends on the intrinsic characteristics of the input tensors, hence many storage formats are designed for tensors to achieve optimal performance for particular applications/architectures, which makes it challenging to implement and optimize every tensor operation of interest on a given architecture. We propose a tensor algebra domain-specific language (DSL) and compiler framework to automatically generate kernels for mixed sparse-dense tensor algebra operations. The proposed DSL provides high-level programming abstractions that resemble the familiar Einstein notation to represent tensor algebra operations. The compiler introduces a new Sparse Tensor Algebra dialect built on top of LLVM’s extensible MLIR compiler infrastructure for efficient code generation while covering a wide range of tensor storage formats. Our compiler also leverages input-dependent code optimization to enhance data locality for better performance. Our results show that the performance of automatically generated kernels outperforms the state-of-the-art sparse tensor algebra compiler, with up to 20.92x, 6.39x, and 13.9x performance improvement over state-of-the-art tensor algebra compilers, for parallel SpMV, SpMM, and TTM, respectively.","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133748294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OpenMP aware MHP Analysis for Improved Static Data-Race Detection","authors":"Utpal Bora, Shraiysh Vaishay, Saurabh Joshi, Ramakrishna Upadrasta","doi":"10.1109/LLVMHPC54804.2021.00006","DOIUrl":"https://doi.org/10.1109/LLVMHPC54804.2021.00006","url":null,"abstract":"Data races, a major source of bugs in concurrent programs, can result in loss of manpower and time as well as data loss due to system failures. OpenMP, the de facto shared memory parallelism framework used in the HPC community, also suffers from data races. To detect race conditions in OpenMP programs and improve turnaround time and/or developer productivity, we present a data flow analysis based, fast, static data race checker in the LLVM compiler framework. Our tool can detect races in the presence or absence of explicit barriers, with implicit or explicit synchronization. In addition, our tool effectively works for the OpenMP target offloading constructs and also supports the frequently used OpenMP constructs.We formalize and provide a data flow analysis framework to perform Phase Interval Analysis (PIA) of OpenMP programs. Phase intervals are then used to compute the MHP (and its complement NHP) sets for the programs, which, in turn, are used to detect data races statically.We evaluate our work using multiple OpenMP race detection benchmarks and real world applications. Our experiments show that the checker is comparable to the state-of-the-art in various performance metrics with around 90% accuracy, almost perfect recall, and significantly lower runtime and memory footprint.","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123264396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dounia Khaldi, Yuanke Luo, Bing Yu, A. Sotkin, B. Morais, M. Girkar
{"title":"Extending LLVM IR for DPC++ Matrix Support: A Case Study with Intel® Advanced Matrix Extensions (Intel® AMX)","authors":"Dounia Khaldi, Yuanke Luo, Bing Yu, A. Sotkin, B. Morais, M. Girkar","doi":"10.1109/llvmhpc54804.2021.00008","DOIUrl":"https://doi.org/10.1109/llvmhpc54804.2021.00008","url":null,"abstract":"In this paper, we introduce a DPC++ matrix extension to unify different tensor hardware: Intel® Advanced Matrix Extensions (Intel® AMX) to CPUs, NVIDIA® TPUs, IBM® POWER® MMA, etc. These tensor hardware units are usually accessed by low-level intrinsics or assembly to perform matrix operations. It is hard for scientists to program these domain- specific devices without the kind of high-level abstractions and efficient implementations we introduce here.We also extend the existing LLVM matrix intrinsics to represent this DPC++ extension and yield efficient Intel AMX code generation. Based on our case study of implementing this interface on Intel AMX hardware, we discuss some of the limitations of existing LLVM Intermediate Representation (IR) and how they can be overcome to exploit tensor hardware.","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127857224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward an Automated Hardware Pipelining LLVM Pass Infrastructure","authors":"John D. Leidel, Ryan Kabrick, D. Donofrio","doi":"10.1109/llvmhpc54804.2021.00010","DOIUrl":"https://doi.org/10.1109/llvmhpc54804.2021.00010","url":null,"abstract":"The many nuances associated with hardware development have fostered a development environment exclusive to those possessing extensive knowledge on the low-level implementation details necessary for an effective design. Allowing users to focus on the design aspects specific to the domain they work in by abstracting the low-level implementation details could prove invaluable to their successThis work describes the StoneCutter infrastructure, along with its encompassing OpenSoC System Architect suite of tools, provide users with a high-level, C-like syntax for rapidly designing ISAs. The compiler is responsible for ingesting instruction definitions and generating optimized Chisel HDL output as well as target-specific LLVM-linked compiler capable of executing binaries on the prototype ISA. During the codegen phase, the necessary control signals are subsequently generated and then used to automatically pipeline the entire ISA based on the design’s I/O, arithmetic operations, and flow-control.","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128271971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[Title page]","authors":"","doi":"10.1109/llvmhpc54804.2021.00001","DOIUrl":"https://doi.org/10.1109/llvmhpc54804.2021.00001","url":null,"abstract":"","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"931 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116423971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}