针对高效矩阵乘法的深度学习驱动编译器增强功能

Journal of Computers, Mechanical and Management Pub Date : 2024-07-01 DOI:10.57159/gadl.jcmm.3.2.240122

Raunak Kumar, Karma Chhering Negi, Nitish Kumar Sharma, Priya Gupta

{"title":"针对高效矩阵乘法的深度学习驱动编译器增强功能","authors":"Raunak Kumar, Karma Chhering Negi, Nitish Kumar Sharma, Priya Gupta","doi":"10.57159/gadl.jcmm.3.2.240122","DOIUrl":null,"url":null,"abstract":"Matrix multiplication is a fundamental operation in many computational fields, requiring optimization to handle increasing data sizes efficiently. In this paper, the implementation of Deep Learning in Matrix multiplication is reviewed, which is considered important nowadays due to the growing complexity of matrix multiplication for gaming and complex programs. The current standard matrix multiplication and the time taken by it on different matrix sizes are described. The Tiled Matrix multiplication, which trims the matrix into various pieces and calculates the product for each piece, and thereafter combines the result, is also described. The times taken by both methods for different matrix sizes were compared. The main idea was to use Deep Neural Networks (DNN) to compare and rank code variants that are obtained in pieces and determine their relative performance. A tournament-based ranking system is used for assigning ranks to the code versions. The effectiveness of these techniques was evaluated on various matrix multiplication operations commonly found in deep learning workloads. Up to 8.844x speedup over the naive implementation for a matrix size of 1024 is achieved by this approach. The results demonstrate the effectiveness of combining compiler optimization techniques and deep learning models in optimizing matrix multiplication.","PeriodicalId":372188,"journal":{"name":"Journal of Computers, Mechanical and Management","volume":"21 19","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Learning-Driven Compiler Enhancements for Efficient Matrix Multiplication\",\"authors\":\"Raunak Kumar, Karma Chhering Negi, Nitish Kumar Sharma, Priya Gupta\",\"doi\":\"10.57159/gadl.jcmm.3.2.240122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Matrix multiplication is a fundamental operation in many computational fields, requiring optimization to handle increasing data sizes efficiently. In this paper, the implementation of Deep Learning in Matrix multiplication is reviewed, which is considered important nowadays due to the growing complexity of matrix multiplication for gaming and complex programs. The current standard matrix multiplication and the time taken by it on different matrix sizes are described. The Tiled Matrix multiplication, which trims the matrix into various pieces and calculates the product for each piece, and thereafter combines the result, is also described. The times taken by both methods for different matrix sizes were compared. The main idea was to use Deep Neural Networks (DNN) to compare and rank code variants that are obtained in pieces and determine their relative performance. A tournament-based ranking system is used for assigning ranks to the code versions. The effectiveness of these techniques was evaluated on various matrix multiplication operations commonly found in deep learning workloads. Up to 8.844x speedup over the naive implementation for a matrix size of 1024 is achieved by this approach. The results demonstrate the effectiveness of combining compiler optimization techniques and deep learning models in optimizing matrix multiplication.\",\"PeriodicalId\":372188,\"journal\":{\"name\":\"Journal of Computers, Mechanical and Management\",\"volume\":\"21 19\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computers, Mechanical and Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.57159/gadl.jcmm.3.2.240122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computers, Mechanical and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.57159/gadl.jcmm.3.2.240122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

矩阵乘法是许多计算领域的基本运算，需要进行优化才能高效处理不断增大的数据量。本文回顾了深度学习在矩阵乘法中的实现，由于矩阵乘法在游戏和复杂程序中的复杂性不断增加，因此它在当今被认为是非常重要的。本文介绍了当前的标准矩阵乘法及其在不同矩阵大小上所需的时间。此外，还介绍了平铺矩阵乘法，它将矩阵修剪成不同部分，计算每部分的乘积，然后合并结果。比较了两种方法针对不同矩阵大小所需的时间。主要想法是使用深度神经网络（DNN）对分块获得的代码变体进行比较和排序，并确定它们的相对性能。基于锦标赛的排名系统用于为代码版本分配排名。在深度学习工作负载中常见的各种矩阵乘法运算中，对这些技术的有效性进行了评估。在矩阵大小为 1024 的情况下，该方法的速度比原始实现提高了 8.844 倍。这些结果证明了结合编译器优化技术和深度学习模型优化矩阵乘法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Learning-Driven Compiler Enhancements for Efficient Matrix Multiplication

Matrix multiplication is a fundamental operation in many computational fields, requiring optimization to handle increasing data sizes efficiently. In this paper, the implementation of Deep Learning in Matrix multiplication is reviewed, which is considered important nowadays due to the growing complexity of matrix multiplication for gaming and complex programs. The current standard matrix multiplication and the time taken by it on different matrix sizes are described. The Tiled Matrix multiplication, which trims the matrix into various pieces and calculates the product for each piece, and thereafter combines the result, is also described. The times taken by both methods for different matrix sizes were compared. The main idea was to use Deep Neural Networks (DNN) to compare and rank code variants that are obtained in pieces and determine their relative performance. A tournament-based ranking system is used for assigning ranks to the code versions. The effectiveness of these techniques was evaluated on various matrix multiplication operations commonly found in deep learning workloads. Up to 8.844x speedup over the naive implementation for a matrix size of 1024 is achieved by this approach. The results demonstrate the effectiveness of combining compiler optimization techniques and deep learning models in optimizing matrix multiplication.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Computers, Mechanical and Management

自引率

0.00%

发文量