{"title":"基于嵌入式fpga的紧凑卷积神经网络处理器设计","authors":"Yingjian Ling, Hsu-Hsun Chin, Hsin-I Wu, R. Tsay","doi":"10.1109/GCAIoT51063.2020.9345903","DOIUrl":null,"url":null,"abstract":"FPGA-based Convolutional Neural Network (CNN) processor has been widely applied for highly-parallelized computations and fast deployment. However, designing on embedded FPGA needs to consider multiple aspects, such as the feasibility of limited configurable resource on FPGA, external memory latency and the scheduling between memory and computation units. These considerations hence hinder the usage of FPGA. Addressing these issues, we elaborate a systematic design approach that allow fast deployment, which includes the parameterized computation and memory unit, which can be configured based on the target platform, and an evaluation approach for searching the optimal setting sets. To evaluate the proposed approach, we performed object detection task, YOLOv2, on PYNQ-Zl and achieved 48.23 GOPs throughputs as well as 0.611 seconds execution time. This is 42.38 and 12.8 times faster than the same inference on CPU and GPU and is 2.36 times faster than other FPGA implementations. Additionally, our created evaluation model is only 5-22% apart from the implementation result, which is 60% less than previous work.","PeriodicalId":398815,"journal":{"name":"2020 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Designing A Compact Convolutional Neural Network Processor on Embedded FPGAs\",\"authors\":\"Yingjian Ling, Hsu-Hsun Chin, Hsin-I Wu, R. Tsay\",\"doi\":\"10.1109/GCAIoT51063.2020.9345903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"FPGA-based Convolutional Neural Network (CNN) processor has been widely applied for highly-parallelized computations and fast deployment. However, designing on embedded FPGA needs to consider multiple aspects, such as the feasibility of limited configurable resource on FPGA, external memory latency and the scheduling between memory and computation units. These considerations hence hinder the usage of FPGA. Addressing these issues, we elaborate a systematic design approach that allow fast deployment, which includes the parameterized computation and memory unit, which can be configured based on the target platform, and an evaluation approach for searching the optimal setting sets. To evaluate the proposed approach, we performed object detection task, YOLOv2, on PYNQ-Zl and achieved 48.23 GOPs throughputs as well as 0.611 seconds execution time. This is 42.38 and 12.8 times faster than the same inference on CPU and GPU and is 2.36 times faster than other FPGA implementations. Additionally, our created evaluation model is only 5-22% apart from the implementation result, which is 60% less than previous work.\",\"PeriodicalId\":398815,\"journal\":{\"name\":\"2020 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GCAIoT51063.2020.9345903\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCAIoT51063.2020.9345903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Designing A Compact Convolutional Neural Network Processor on Embedded FPGAs
FPGA-based Convolutional Neural Network (CNN) processor has been widely applied for highly-parallelized computations and fast deployment. However, designing on embedded FPGA needs to consider multiple aspects, such as the feasibility of limited configurable resource on FPGA, external memory latency and the scheduling between memory and computation units. These considerations hence hinder the usage of FPGA. Addressing these issues, we elaborate a systematic design approach that allow fast deployment, which includes the parameterized computation and memory unit, which can be configured based on the target platform, and an evaluation approach for searching the optimal setting sets. To evaluate the proposed approach, we performed object detection task, YOLOv2, on PYNQ-Zl and achieved 48.23 GOPs throughputs as well as 0.611 seconds execution time. This is 42.38 and 12.8 times faster than the same inference on CPU and GPU and is 2.36 times faster than other FPGA implementations. Additionally, our created evaluation model is only 5-22% apart from the implementation result, which is 60% less than previous work.