{"title":"一种FPGA友好的混合神经网络近似计算框架(摘要)","authors":"Haiyue Song, Xiang Song, Tianjian Li, Hao Dong, Naifeng Jing, Xiaoyao Liang, Li Jiang","doi":"10.1145/3174243.3174965","DOIUrl":null,"url":null,"abstract":"Neural approximate computing is promising to gain energy-efficiency at the cost of tolerable quality loss. The architecture contains two neural networks: the approximate accelerator generates approximate results while the classifier determines whether input data can be safely approximated. However, they are not compatible to a heterogeneous computing platform, due to the large communication overhead between the approximate accelerator and accurate cores, and the large speed gap between them. This paper proposes a software-hardware co-design strategy. With deep exploration of data distributions in the feature space, we first propose a novel approximate computing architecture containing a multi-class classifier and multiple approximate accelerator; this architecture, derived by the existing iterative co-training methods, can shift more data from accurate computation (in CPU) to approximate accelerator (in FPGA); the increased invocation of the approximate accelerator thus can yield higher utilization of the FPGA-based accelerator, resulting in the enhanced the performance. Moreover, much less input data is redistributed, by the classifier (also in FPGA), back to CPU, which can minimize the CPU-FPGA communication. Second, we design a pipelined data-path with batched input/output for the proposed hybrid architecture to efficiently hide the communication latency. A mask technique is proposed to decouple the synchronization between CPU and FPGA, in order to minimize the frequency of communication.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"156 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only)\",\"authors\":\"Haiyue Song, Xiang Song, Tianjian Li, Hao Dong, Naifeng Jing, Xiaoyao Liang, Li Jiang\",\"doi\":\"10.1145/3174243.3174965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural approximate computing is promising to gain energy-efficiency at the cost of tolerable quality loss. The architecture contains two neural networks: the approximate accelerator generates approximate results while the classifier determines whether input data can be safely approximated. However, they are not compatible to a heterogeneous computing platform, due to the large communication overhead between the approximate accelerator and accurate cores, and the large speed gap between them. This paper proposes a software-hardware co-design strategy. With deep exploration of data distributions in the feature space, we first propose a novel approximate computing architecture containing a multi-class classifier and multiple approximate accelerator; this architecture, derived by the existing iterative co-training methods, can shift more data from accurate computation (in CPU) to approximate accelerator (in FPGA); the increased invocation of the approximate accelerator thus can yield higher utilization of the FPGA-based accelerator, resulting in the enhanced the performance. Moreover, much less input data is redistributed, by the classifier (also in FPGA), back to CPU, which can minimize the CPU-FPGA communication. Second, we design a pipelined data-path with batched input/output for the proposed hybrid architecture to efficiently hide the communication latency. A mask technique is proposed to decouple the synchronization between CPU and FPGA, in order to minimize the frequency of communication.\",\"PeriodicalId\":164936,\"journal\":{\"name\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"156 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3174243.3174965\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only)
Neural approximate computing is promising to gain energy-efficiency at the cost of tolerable quality loss. The architecture contains two neural networks: the approximate accelerator generates approximate results while the classifier determines whether input data can be safely approximated. However, they are not compatible to a heterogeneous computing platform, due to the large communication overhead between the approximate accelerator and accurate cores, and the large speed gap between them. This paper proposes a software-hardware co-design strategy. With deep exploration of data distributions in the feature space, we first propose a novel approximate computing architecture containing a multi-class classifier and multiple approximate accelerator; this architecture, derived by the existing iterative co-training methods, can shift more data from accurate computation (in CPU) to approximate accelerator (in FPGA); the increased invocation of the approximate accelerator thus can yield higher utilization of the FPGA-based accelerator, resulting in the enhanced the performance. Moreover, much less input data is redistributed, by the classifier (also in FPGA), back to CPU, which can minimize the CPU-FPGA communication. Second, we design a pipelined data-path with batched input/output for the proposed hybrid architecture to efficiently hide the communication latency. A mask technique is proposed to decouple the synchronization between CPU and FPGA, in order to minimize the frequency of communication.