GPU 架构上 BLAS 库的定量性能分析

Deu Muhendislik Fakultesi Fen ve Muhendislik Pub Date : 2024-01-23 DOI:10.21205/deufmd.2024267606

Isil Öz

{"title":"GPU 架构上 BLAS 库的定量性能分析","authors":"Isil Öz","doi":"10.21205/deufmd.2024267606","DOIUrl":null,"url":null,"abstract":"Basic Linear Algebra Subprograms (BLAS) are a set of linear algebra routines commonly used by machine learning applications and scientific computing. BLAS libraries with optimized implementations of BLAS routines offer high performance by exploiting parallel execution units in target computing systems. With massively large number of cores, graphics processing units (GPUs) exhibit high performance for computationally-heavy workloads. Recent BLAS libraries utilize parallel cores of GPU architectures efficiently by employing inherent data parallelism. In this study, we analyze GPU-targeted functions from two BLAS libraries, cuBLAS and MAGMA, and evaluate their performance on a single-GPU NVIDIA architecture by considering architectural features and limitations. We collect architectural performance metrics and explore resource utilization characteristics. Our work aims to help researchers and programmers to understand the performance behavior and GPU resource utilization of the BLAS routines implemented by the libraries.","PeriodicalId":519023,"journal":{"name":"Deu Muhendislik Fakultesi Fen ve Muhendislik","volume":"25 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans Analizi\",\"authors\":\"Isil Öz\",\"doi\":\"10.21205/deufmd.2024267606\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Basic Linear Algebra Subprograms (BLAS) are a set of linear algebra routines commonly used by machine learning applications and scientific computing. BLAS libraries with optimized implementations of BLAS routines offer high performance by exploiting parallel execution units in target computing systems. With massively large number of cores, graphics processing units (GPUs) exhibit high performance for computationally-heavy workloads. Recent BLAS libraries utilize parallel cores of GPU architectures efficiently by employing inherent data parallelism. In this study, we analyze GPU-targeted functions from two BLAS libraries, cuBLAS and MAGMA, and evaluate their performance on a single-GPU NVIDIA architecture by considering architectural features and limitations. We collect architectural performance metrics and explore resource utilization characteristics. Our work aims to help researchers and programmers to understand the performance behavior and GPU resource utilization of the BLAS routines implemented by the libraries.\",\"PeriodicalId\":519023,\"journal\":{\"name\":\"Deu Muhendislik Fakultesi Fen ve Muhendislik\",\"volume\":\"25 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Deu Muhendislik Fakultesi Fen ve Muhendislik\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21205/deufmd.2024267606\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Deu Muhendislik Fakultesi Fen ve Muhendislik","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21205/deufmd.2024267606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基本线性代数子程序（BLAS）是机器学习应用和科学计算中常用的一组线性代数例程。通过利用目标计算系统中的并行执行单元，对 BLAS 例程进行优化实现的 BLAS 库可以提供高性能。图形处理器（GPU）拥有大量内核，可为计算繁重的工作负载提供高性能。最近的 BLAS 库通过利用固有的数据并行性，有效地利用了 GPU 架构的并行内核。在本研究中，我们分析了两个 BLAS 库（cuBLAS 和 MAGMA）中的 GPU 目标函数，并通过考虑架构特性和限制，评估了它们在单 GPU NVIDIA 架构上的性能。我们收集了架构性能指标，并探索了资源利用特征。我们的工作旨在帮助研究人员和程序员了解由这些库实现的 BLAS 例程的性能行为和 GPU 资源利用情况。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BLAS Kütüphanelerinin GPU Mimarilerindeki Nicel Performans Analizi

Basic Linear Algebra Subprograms (BLAS) are a set of linear algebra routines commonly used by machine learning applications and scientific computing. BLAS libraries with optimized implementations of BLAS routines offer high performance by exploiting parallel execution units in target computing systems. With massively large number of cores, graphics processing units (GPUs) exhibit high performance for computationally-heavy workloads. Recent BLAS libraries utilize parallel cores of GPU architectures efficiently by employing inherent data parallelism. In this study, we analyze GPU-targeted functions from two BLAS libraries, cuBLAS and MAGMA, and evaluate their performance on a single-GPU NVIDIA architecture by considering architectural features and limitations. We collect architectural performance metrics and explore resource utilization characteristics. Our work aims to help researchers and programmers to understand the performance behavior and GPU resource utilization of the BLAS routines implemented by the libraries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Deu Muhendislik Fakultesi Fen ve Muhendislik

自引率

0.00%

发文量