{"title":"加速卷积神经网络的高效SIMD实现","authors":"Sung-Jin Lee, Sang-Soo Park, Ki-Seok Chung","doi":"10.1145/3290420.3290444","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Network (CNN) has been used in a variety of fields such as computer vision, speech recognition, and natural language processing. Because the amount of computation has increased tremendously, CNN has lately been accelerated through accelerators such as Graphic Processing Unit (GPU). However, resource-constrained embedded platforms such as Internet of Things (IoT) devices cannot afford to have such accelerators. Therefore, it is important to accelerate CNN by only the CPU efficiently. In this paper, we propose a method to accelerate CNN by using the Single Instruction Multiple Data (SIMD) unit integrated in many CPUs. Modern CPU includes a SIMD unit which is commonly used for vector operations. The proposed method implemented on an ARM's NEON can maximize the utilization of vector registers in the SIMD unit. Our proposed implementation has achieved a speed-up of up to 2.66 in execution time and an energy reduction of up to 3.55 times than the conventional implementation.","PeriodicalId":259201,"journal":{"name":"International Conference on Critical Infrastructure Protection","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Efficient SIMD implementation for accelerating convolutional neural network\",\"authors\":\"Sung-Jin Lee, Sang-Soo Park, Ki-Seok Chung\",\"doi\":\"10.1145/3290420.3290444\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Network (CNN) has been used in a variety of fields such as computer vision, speech recognition, and natural language processing. Because the amount of computation has increased tremendously, CNN has lately been accelerated through accelerators such as Graphic Processing Unit (GPU). However, resource-constrained embedded platforms such as Internet of Things (IoT) devices cannot afford to have such accelerators. Therefore, it is important to accelerate CNN by only the CPU efficiently. In this paper, we propose a method to accelerate CNN by using the Single Instruction Multiple Data (SIMD) unit integrated in many CPUs. Modern CPU includes a SIMD unit which is commonly used for vector operations. The proposed method implemented on an ARM's NEON can maximize the utilization of vector registers in the SIMD unit. Our proposed implementation has achieved a speed-up of up to 2.66 in execution time and an energy reduction of up to 3.55 times than the conventional implementation.\",\"PeriodicalId\":259201,\"journal\":{\"name\":\"International Conference on Critical Infrastructure Protection\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Critical Infrastructure Protection\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3290420.3290444\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Critical Infrastructure Protection","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3290420.3290444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient SIMD implementation for accelerating convolutional neural network
Convolutional Neural Network (CNN) has been used in a variety of fields such as computer vision, speech recognition, and natural language processing. Because the amount of computation has increased tremendously, CNN has lately been accelerated through accelerators such as Graphic Processing Unit (GPU). However, resource-constrained embedded platforms such as Internet of Things (IoT) devices cannot afford to have such accelerators. Therefore, it is important to accelerate CNN by only the CPU efficiently. In this paper, we propose a method to accelerate CNN by using the Single Instruction Multiple Data (SIMD) unit integrated in many CPUs. Modern CPU includes a SIMD unit which is commonly used for vector operations. The proposed method implemented on an ARM's NEON can maximize the utilization of vector registers in the SIMD unit. Our proposed implementation has achieved a speed-up of up to 2.66 in execution time and an energy reduction of up to 3.55 times than the conventional implementation.