{"title":"基于GPU的卷积层推理过程中与硬件无关的时间估计方法","authors":"Chengzhen Meng, Hongjun Dai","doi":"10.1016/j.peva.2023.102368","DOIUrl":null,"url":null,"abstract":"<div><p><span><span>Nowadays, various AI applications based on </span>Convolutional Neural Networks<span> (CNNs) are widely deployed on GPU-accelerated devices. However, due to the lack of visibility into GPU internal scheduling, accurately modeling the performance of CNN inference tasks or estimating the latency of CNN tasks that are executing or waiting on the GPU is challenging. This hurts the multi-model scheduling on multi-device and CNN real-time inference. Therefore, in this paper, we propose a time estimation<span> method to estimate the forward execution time of a convolutional layer with an arbitrary shape on a GPU. The proposed method divides an explicit </span></span></span><em>General Matrix Multiplication</em><span> (GEMM) convolution<span> operation into a series of estimatable GPU operations and constructs performance models at the level of sub-operations rather than hardware instructions or entire models. Also, the proposed method can be easily adapted to different hardware devices or underlying algorithm implementations, since it focuses on the variation of execution time relative to the input data scale, without focusing on specific instructions or hardware actions. According to the experiments on four typical CUDA compatible platforms, the proposed method has an average error rate of less than 5% for convolutional layers in some practical CNN models, and has about 8% error rate in estimating GEMM convolution implementations provided by cuDNN library. Experiments show that the proposed method can predict the forward execution time of convolutional layers of arbitrary size in CNN inference tasks on different GPU models.</span></span></p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"162 ","pages":"Article 102368"},"PeriodicalIF":1.0000,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hardware-independent time estimation method for inference process of convolutional layers on GPU\",\"authors\":\"Chengzhen Meng, Hongjun Dai\",\"doi\":\"10.1016/j.peva.2023.102368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span><span>Nowadays, various AI applications based on </span>Convolutional Neural Networks<span> (CNNs) are widely deployed on GPU-accelerated devices. However, due to the lack of visibility into GPU internal scheduling, accurately modeling the performance of CNN inference tasks or estimating the latency of CNN tasks that are executing or waiting on the GPU is challenging. This hurts the multi-model scheduling on multi-device and CNN real-time inference. Therefore, in this paper, we propose a time estimation<span> method to estimate the forward execution time of a convolutional layer with an arbitrary shape on a GPU. The proposed method divides an explicit </span></span></span><em>General Matrix Multiplication</em><span> (GEMM) convolution<span> operation into a series of estimatable GPU operations and constructs performance models at the level of sub-operations rather than hardware instructions or entire models. Also, the proposed method can be easily adapted to different hardware devices or underlying algorithm implementations, since it focuses on the variation of execution time relative to the input data scale, without focusing on specific instructions or hardware actions. According to the experiments on four typical CUDA compatible platforms, the proposed method has an average error rate of less than 5% for convolutional layers in some practical CNN models, and has about 8% error rate in estimating GEMM convolution implementations provided by cuDNN library. Experiments show that the proposed method can predict the forward execution time of convolutional layers of arbitrary size in CNN inference tasks on different GPU models.</span></span></p></div>\",\"PeriodicalId\":19964,\"journal\":{\"name\":\"Performance Evaluation\",\"volume\":\"162 \",\"pages\":\"Article 102368\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2023-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Performance Evaluation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S016653162300038X\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Performance Evaluation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016653162300038X","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A hardware-independent time estimation method for inference process of convolutional layers on GPU
Nowadays, various AI applications based on Convolutional Neural Networks (CNNs) are widely deployed on GPU-accelerated devices. However, due to the lack of visibility into GPU internal scheduling, accurately modeling the performance of CNN inference tasks or estimating the latency of CNN tasks that are executing or waiting on the GPU is challenging. This hurts the multi-model scheduling on multi-device and CNN real-time inference. Therefore, in this paper, we propose a time estimation method to estimate the forward execution time of a convolutional layer with an arbitrary shape on a GPU. The proposed method divides an explicit General Matrix Multiplication (GEMM) convolution operation into a series of estimatable GPU operations and constructs performance models at the level of sub-operations rather than hardware instructions or entire models. Also, the proposed method can be easily adapted to different hardware devices or underlying algorithm implementations, since it focuses on the variation of execution time relative to the input data scale, without focusing on specific instructions or hardware actions. According to the experiments on four typical CUDA compatible platforms, the proposed method has an average error rate of less than 5% for convolutional layers in some practical CNN models, and has about 8% error rate in estimating GEMM convolution implementations provided by cuDNN library. Experiments show that the proposed method can predict the forward execution time of convolutional layers of arbitrary size in CNN inference tasks on different GPU models.
期刊介绍:
Performance Evaluation functions as a leading journal in the area of modeling, measurement, and evaluation of performance aspects of computing and communication systems. As such, it aims to present a balanced and complete view of the entire Performance Evaluation profession. Hence, the journal is interested in papers that focus on one or more of the following dimensions:
-Define new performance evaluation tools, including measurement and monitoring tools as well as modeling and analytic techniques
-Provide new insights into the performance of computing and communication systems
-Introduce new application areas where performance evaluation tools can play an important role and creative new uses for performance evaluation tools.
More specifically, common application areas of interest include the performance of:
-Resource allocation and control methods and algorithms (e.g. routing and flow control in networks, bandwidth allocation, processor scheduling, memory management)
-System architecture, design and implementation
-Cognitive radio
-VANETs
-Social networks and media
-Energy efficient ICT
-Energy harvesting
-Data centers
-Data centric networks
-System reliability
-System tuning and capacity planning
-Wireless and sensor networks
-Autonomic and self-organizing systems
-Embedded systems
-Network science