{"title":"I2CU: A Dedicated Im2col Hardware Unit","authors":"Tao Zhongyu, Wang Yuanfeng, Zhang Huaisheng","doi":"10.1109/iccwamtip56608.2022.10016515","DOIUrl":null,"url":null,"abstract":"For Convolution Neural Network (CNN), the convolution operation for feature map and weight map usually implemented by im2col + GEMM method. However, for conventional method need expand feature map to a large feature matrix during a single kernel function based on convolution parameters (i.e. filter size, padding, and stride), then multiplication for matrixes took place in another function. Thus the conventional method will generate tons data transfer and the large feature matrix requires enormous storage space, it is hardware unfriendly.We design a hardware unit, I2CU (Im2Col Unit), a dedicated hardware unit to implement im2col in hardware friendly way. I2CU dynamically expand loaded 4D-Block return from texture unit and write back destination matrix to shared memory. I2CU can decrease the feature matrix storage space and implement im2col + GEMM in one kernel function.","PeriodicalId":159508,"journal":{"name":"2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccwamtip56608.2022.10016515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
For Convolution Neural Network (CNN), the convolution operation for feature map and weight map usually implemented by im2col + GEMM method. However, for conventional method need expand feature map to a large feature matrix during a single kernel function based on convolution parameters (i.e. filter size, padding, and stride), then multiplication for matrixes took place in another function. Thus the conventional method will generate tons data transfer and the large feature matrix requires enormous storage space, it is hardware unfriendly.We design a hardware unit, I2CU (Im2Col Unit), a dedicated hardware unit to implement im2col in hardware friendly way. I2CU dynamically expand loaded 4D-Block return from texture unit and write back destination matrix to shared memory. I2CU can decrease the feature matrix storage space and implement im2col + GEMM in one kernel function.