ISC Workshops最新文献

Challenges and Opportunities for RISC-V Architectures towards Genomics-based Workloads 面向基因组工作负载的RISC-V架构的挑战与机遇

ISC Workshops Pub Date : 2023-06-27 DOI: 10.48550/arXiv.2306.15562

Gonzalo Gómez-Sánchez, A. Call, Xavier Teruel, Lorena Alonso, Ignasi Morán, Miguel Angel Perez, D. Torrents, J. L. Berral

{"title":"Challenges and Opportunities for RISC-V Architectures towards Genomics-based Workloads","authors":"Gonzalo Gómez-Sánchez, A. Call, Xavier Teruel, Lorena Alonso, Ignasi Morán, Miguel Angel Perez, D. Torrents, J. L. Berral","doi":"10.48550/arXiv.2306.15562","DOIUrl":"https://doi.org/10.48550/arXiv.2306.15562","url":null,"abstract":"The use of large-scale supercomputing architectures is a hard requirement for scientific computing Big-Data applications. An example is genomics analytics, where millions of data transformations and tests per patient need to be done to find relevant clinical indicators. Therefore, to ensure open and broad access to high-performance technologies, governments, and academia are pushing toward the introduction of novel computing architectures in large-scale scientific environments. This is the case of RISC-V, an open-source and royalty-free instruction-set architecture. To evaluate such technologies, here we present the Variant-Interaction Analytics use case benchmarking suite and datasets. Through this use case, we search for possible genetic interactions using computational and statistical methods, providing a representative case for heavy ETL (Extract, Transform, Load) data processing. Current implementations are implemented in x86-based supercomputers (e.g. MareNostrum-IV at the Barcelona Supercomputing Center (BSC)), and future steps propose RISC-V as part of the next MareNostrum generations. Here we describe the Variant Interaction Use Case, highlighting the characteristics leveraging high-performance computing, indicating the caveats and challenges towards the next RISC-V developments and designs to come from a first comparison between x86 and RISC-V architectures on real Variant Interaction executions over real hardware implementations.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116139531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study 支持扩展和早期协同设计的软件开发工具:RISC-V和HPC案例研究

ISC Workshops Pub Date : 2023-06-01 DOI: 10.48550/arXiv.2306.01797

F. Mantovani, Pablo Vizcaino, Fabio Banchelli, M. Garcia-Gasulla, R. Ferrer, Giorgos Ieronymakis, Nikos Dimou, Vassilis D. Papaefstathiou, Jesús Labarta

引用次数: 2

Backporting RISC-V Vector assembly 支持RISC-V矢量组件

ISC Workshops Pub Date : 2023-04-20 DOI: 10.48550/arXiv.2304.10324

Joseph K. L. Lee, Maurice Jamieson, Nick Brown

{"title":"Backporting RISC-V Vector assembly","authors":"Joseph K. L. Lee, Maurice Jamieson, Nick Brown","doi":"10.48550/arXiv.2304.10324","DOIUrl":"https://doi.org/10.48550/arXiv.2304.10324","url":null,"abstract":"Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116499860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Test-driving RISC-V Vector hardware for HPC 测试驱动RISC-V矢量硬件的HPC

ISC Workshops Pub Date : 2023-04-20 DOI: 10.48550/arXiv.2304.10319

Joseph K. L. Lee, Maurice Jamieson, Nick Brown, Ricardo Jesus

引用次数: 5

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators 最先进加速器上OpenMP卸载的可移植性和可扩展性

ISC Workshops Pub Date : 2023-04-09 DOI: 10.48550/arXiv.2304.04276

Yehonatan Fridman, G. Tamir, Gal Oren

{"title":"Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators","authors":"Yehonatan Fridman, G. Tamir, Gal Oren","doi":"10.48550/arXiv.2304.04276","DOIUrl":"https://doi.org/10.48550/arXiv.2304.04276","url":null,"abstract":"Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs -- the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs -- were released to the market, with the oneAPI and NVHPC compilers for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the portability of advanced directives (using SOLLVE's OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the coverage for version 4.5 is nearly complete in both latest NVHPC and oneAPI tools. However, we observed a lack of support in versions 5.0, 5.1, and 5.2, which is particularly noticeable when using NVHPC. From the performance perspective, we found that the PVC1100 and A100 are relatively comparable on the LULESH benchmark. While the A100 is slightly better due to faster memory bandwidth, the PVC1100 reaches the next problem size (400^3) scalably due to the larger memory size.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127897643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads 异构人工智能工作负载的精确能耗测量

ISC Workshops Pub Date : 2022-12-03 DOI: 10.48550/arXiv.2212.01698

R. Caspart, Sebastian Ziegler, Arvid Weyrauch, Holger Obermaier, Simon Raffeiner, Leonie Schuhmacher, J. Scholtyssek, D. Trofimova, M. Nolden, I. Reinartz, Fabian Isensee, Markus Goetz, C. Debus

{"title":"Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads","authors":"R. Caspart, Sebastian Ziegler, Arvid Weyrauch, Holger Obermaier, Simon Raffeiner, Leonie Schuhmacher, J. Scholtyssek, D. Trofimova, M. Nolden, I. Reinartz, Fabian Isensee, Markus Goetz, C. Debus","doi":"10.48550/arXiv.2212.01698","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01698","url":null,"abstract":". With the rise of artiﬁcial intelligence (AI) in recent years and the subsequent increase in complexity of the applied models, the growing demand in computational resources is starting to pose a signif-icant challenge. The need for higher compute power is being met with increasingly more potent accelerator hardware as well as the use of large and powerful compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems ulti-mately comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, awareness of energy eﬃciency plays an important role for AI model developers and hardware infrastructure operators likewise. The energy consumption of AI workloads depends both on the model implementation and the composition of the utilized hardware. Therefore, accurate measurements of the power draw of respective AI workﬂows on diﬀerent types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. Towards this end, we present measurements of the energy consumption of two typical applications of deep learning models on diﬀerent types of heterogeneous compute nodes. Our results indicate that 1. contrary to common approaches, deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional ineﬃ-ciency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately – while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably eﬃcient. One advantage of our approach is the fact that the information on energy consumption is available to all users of the supercomputer and not just those with administrator rights, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.","PeriodicalId":345133,"journal":{"name":"ISC Workshops","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125602079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Workflows to driving high-performance interactive supercomputing for urgent decision making 为紧急决策制定驱动高性能交互式超级计算的工作流程

ISC Workshops Pub Date : 2022-06-28 DOI: 10.48550/arXiv.2206.14103

Nick Brown, R. Nash, G. Gibb, E. Belikov, Artur Podobas, W. Chien, S. Markidis, M. Flatken, A. Gerndt

引用次数: 0

Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free Optimization Algorithms 使用无梯度优化算法自动调整Tensorflow的CPU后端

ISC Workshops Pub Date : 2021-09-13 DOI: 10.1007/978-3-030-90539-2_17

Derssie Mebratu, N. Hasabnis, Pietro Mercati, Gaurit Sharma, S. Najnin

引用次数: 0

Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature Review 对HPC中源对源编译器适用性的负面看法:文献综述

ISC Workshops Pub Date : 2021-07-01 DOI: 10.1007/978-3-030-90539-2_16

Reed Milewicz, P. Pirkelbauer, Prema Soundararajan, H. Ahmed, A. Skjellum