基于改进卷积神经网络的高效并行处理

2017 IEEE International Conference on Information Reuse and Integration (IRI) Pub Date : 2017-08-01 DOI:10.1109/IRI.2017.37

Sang-Soo Park, Jung-Hyun Hong, Ki-Seok Chung

{"title":"基于改进卷积神经网络的高效并行处理","authors":"Sang-Soo Park, Jung-Hyun Hong, Ki-Seok Chung","doi":"10.1109/IRI.2017.37","DOIUrl":null,"url":null,"abstract":"Today, Convolutional Neural Network (CNN) is adopted in a lot of areas such as computer vision and natural language processing. By employing hardware accelerators such as graphic processing unit (GPU), a significant amount of speedup can be achieved in CNN and many studies have proposed such acceleration methods. However, it is not straightforward to parallelize the CNN on a hardware accelerator because there are irregular characteristics of generating output feature maps. In this paper, we propose a modified CNN for efficient parallel processing. A well-known CNN architecture called Lenet-5 has an inefficient convolution combination. The proposed method of this paper improves the efficiency by utilizing a special operation called dummy operation. The proposed method is capable of maximizing the utilization of GPU by modifying Lenet-5's convolution combination. Its improved efficiency is validated on a platform that integrates a CPU and a GPU in the same die. Our OpenCL implementation of the proposed method has achieved an average peak performance of 115.66 GFLOPS which is an improvement of 37.26 times in execution time. Further, a reduction of 26.40 times in energy consumption is achieved.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Modified Convolution Neural Network for Highly Effective Parallel Processing\",\"authors\":\"Sang-Soo Park, Jung-Hyun Hong, Ki-Seok Chung\",\"doi\":\"10.1109/IRI.2017.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today, Convolutional Neural Network (CNN) is adopted in a lot of areas such as computer vision and natural language processing. By employing hardware accelerators such as graphic processing unit (GPU), a significant amount of speedup can be achieved in CNN and many studies have proposed such acceleration methods. However, it is not straightforward to parallelize the CNN on a hardware accelerator because there are irregular characteristics of generating output feature maps. In this paper, we propose a modified CNN for efficient parallel processing. A well-known CNN architecture called Lenet-5 has an inefficient convolution combination. The proposed method of this paper improves the efficiency by utilizing a special operation called dummy operation. The proposed method is capable of maximizing the utilization of GPU by modifying Lenet-5's convolution combination. Its improved efficiency is validated on a platform that integrates a CPU and a GPU in the same die. Our OpenCL implementation of the proposed method has achieved an average peak performance of 115.66 GFLOPS which is an improvement of 37.26 times in execution time. Further, a reduction of 26.40 times in energy consumption is achieved.\",\"PeriodicalId\":254330,\"journal\":{\"name\":\"2017 IEEE International Conference on Information Reuse and Integration (IRI)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Information Reuse and Integration (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2017.37\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2017.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

如今，卷积神经网络(CNN)被应用于计算机视觉和自然语言处理等许多领域。通过使用图形处理单元(GPU)等硬件加速器，可以在CNN中实现显著的加速，许多研究都提出了这样的加速方法。然而，在硬件加速器上并行化CNN并不是一件简单的事情，因为在生成输出特征映射时存在不规则的特征。在本文中，我们提出了一种改进的CNN来实现高效的并行处理。著名的CNN架构Lenet-5有一个低效的卷积组合。本文提出的方法利用一种称为虚拟操作的特殊操作提高了效率。该方法通过修改Lenet-5的卷积组合，使GPU的利用率最大化。在将CPU和GPU集成在同一芯片中的平台上验证了其提高的效率。我们提出的方法的OpenCL实现实现了平均峰值性能为115.66 GFLOPS，执行时间提高了37.26倍。此外，能源消耗减少26.40倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Modified Convolution Neural Network for Highly Effective Parallel Processing

Today, Convolutional Neural Network (CNN) is adopted in a lot of areas such as computer vision and natural language processing. By employing hardware accelerators such as graphic processing unit (GPU), a significant amount of speedup can be achieved in CNN and many studies have proposed such acceleration methods. However, it is not straightforward to parallelize the CNN on a hardware accelerator because there are irregular characteristics of generating output feature maps. In this paper, we propose a modified CNN for efficient parallel processing. A well-known CNN architecture called Lenet-5 has an inefficient convolution combination. The proposed method of this paper improves the efficiency by utilizing a special operation called dummy operation. The proposed method is capable of maximizing the utilization of GPU by modifying Lenet-5's convolution combination. Its improved efficiency is validated on a platform that integrates a CPU and a GPU in the same die. Our OpenCL implementation of the proposed method has achieved an average peak performance of 115.66 GFLOPS which is an improvement of 37.26 times in execution time. Further, a reduction of 26.40 times in energy consumption is achieved.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Information Reuse and Integration (IRI)

自引率

0.00%

发文量