{"title":"基于时间共享的低成本FPGA CNN并行实现","authors":"Shefa A. Dawwd, Basil Sh. Mahmood","doi":"10.1109/ICOASE.2018.8548825","DOIUrl":null,"url":null,"abstract":"Convolutional neural network (CNN) is a multilayer architecture, and considered as a robust model for image recognition. Learning in this neural network achieves progressively in its successive layers such that the layers produce higher-level features and the categories are produced by the last layer. To use the CNN in different real time applications, high performance implementation is required. To reduce the resources required for implementation, in this paper a time sharing based parallel implementation of CNN is proposed. The computing of the upper convolution nodes is done sequentially while the parallelism is increased in the direction of the preceding layer resulting maximum parallelism in the bottom layer Then the CNN relatively complex design is implemented on an FPGA model with no more than 200,000 gates and can speed up computation up to 166 times.","PeriodicalId":144020,"journal":{"name":"2018 International Conference on Advanced Science and Engineering (ICOASE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Time Sharing Based Parallel Implementation of CNN on Low Cost FPGA\",\"authors\":\"Shefa A. Dawwd, Basil Sh. Mahmood\",\"doi\":\"10.1109/ICOASE.2018.8548825\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural network (CNN) is a multilayer architecture, and considered as a robust model for image recognition. Learning in this neural network achieves progressively in its successive layers such that the layers produce higher-level features and the categories are produced by the last layer. To use the CNN in different real time applications, high performance implementation is required. To reduce the resources required for implementation, in this paper a time sharing based parallel implementation of CNN is proposed. The computing of the upper convolution nodes is done sequentially while the parallelism is increased in the direction of the preceding layer resulting maximum parallelism in the bottom layer Then the CNN relatively complex design is implemented on an FPGA model with no more than 200,000 gates and can speed up computation up to 166 times.\",\"PeriodicalId\":144020,\"journal\":{\"name\":\"2018 International Conference on Advanced Science and Engineering (ICOASE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Advanced Science and Engineering (ICOASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOASE.2018.8548825\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Advanced Science and Engineering (ICOASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOASE.2018.8548825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Time Sharing Based Parallel Implementation of CNN on Low Cost FPGA
Convolutional neural network (CNN) is a multilayer architecture, and considered as a robust model for image recognition. Learning in this neural network achieves progressively in its successive layers such that the layers produce higher-level features and the categories are produced by the last layer. To use the CNN in different real time applications, high performance implementation is required. To reduce the resources required for implementation, in this paper a time sharing based parallel implementation of CNN is proposed. The computing of the upper convolution nodes is done sequentially while the parallelism is increased in the direction of the preceding layer resulting maximum parallelism in the bottom layer Then the CNN relatively complex design is implemented on an FPGA model with no more than 200,000 gates and can speed up computation up to 166 times.