{"title":"基于sat的编译到非诺伊曼处理器","authors":"S. Chaudhuri, A. Hetzel","doi":"10.1109/ICCAD.2017.8203842","DOIUrl":null,"url":null,"abstract":"This paper describes a compilation technique used to accelerate dataflow computations, common in deep neural network computing, onto Coarse Grained Reconfigurable Array (CGRA) architectures. This technique has been demonstrated to automatically compile dataflow programs onto a commercial massively parallel CGRA-based dataflow processor (DPU) containing 16000 processing elements. The DPU architecture overcomes the von Neumann bottleneck by spatially flowing and reusing data from local memories, and provides higher computation efficiency compared to temporal parallel architectures such as GPUs and multi-core CPUs. However, existing software development tools for CGRAs are limited to compiling domain specific programs to processing elements with uniform structures, and are not effective on complex micro architectures where latencies of memory access vary in a nontrivial fashion depending on data locality. A primary contribution of this paper is to provide a general algorithm that can compile general dataflow graphs, and can efficiently utilize processing elements with rich micro-architectural features such as complex instructions, multi-precision data paths, local memories, register files, switches etc. Another contribution is a uniquely innovative application of Boolean Satisfiability to formally solve this complex, and irregular optimization problem and produce high-quality results comparable to hand-written assembly code produced by human experts. A third contribution is an adaptive windowing algorithm that harnesses the complexity of the SAT-based approach and delivers a scalable and robust solution.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"322 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"SAT-based compilation to a non-vonNeumann processor\",\"authors\":\"S. Chaudhuri, A. Hetzel\",\"doi\":\"10.1109/ICCAD.2017.8203842\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a compilation technique used to accelerate dataflow computations, common in deep neural network computing, onto Coarse Grained Reconfigurable Array (CGRA) architectures. This technique has been demonstrated to automatically compile dataflow programs onto a commercial massively parallel CGRA-based dataflow processor (DPU) containing 16000 processing elements. The DPU architecture overcomes the von Neumann bottleneck by spatially flowing and reusing data from local memories, and provides higher computation efficiency compared to temporal parallel architectures such as GPUs and multi-core CPUs. However, existing software development tools for CGRAs are limited to compiling domain specific programs to processing elements with uniform structures, and are not effective on complex micro architectures where latencies of memory access vary in a nontrivial fashion depending on data locality. A primary contribution of this paper is to provide a general algorithm that can compile general dataflow graphs, and can efficiently utilize processing elements with rich micro-architectural features such as complex instructions, multi-precision data paths, local memories, register files, switches etc. Another contribution is a uniquely innovative application of Boolean Satisfiability to formally solve this complex, and irregular optimization problem and produce high-quality results comparable to hand-written assembly code produced by human experts. A third contribution is an adaptive windowing algorithm that harnesses the complexity of the SAT-based approach and delivers a scalable and robust solution.\",\"PeriodicalId\":126686,\"journal\":{\"name\":\"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)\",\"volume\":\"322 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCAD.2017.8203842\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAD.2017.8203842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SAT-based compilation to a non-vonNeumann processor
This paper describes a compilation technique used to accelerate dataflow computations, common in deep neural network computing, onto Coarse Grained Reconfigurable Array (CGRA) architectures. This technique has been demonstrated to automatically compile dataflow programs onto a commercial massively parallel CGRA-based dataflow processor (DPU) containing 16000 processing elements. The DPU architecture overcomes the von Neumann bottleneck by spatially flowing and reusing data from local memories, and provides higher computation efficiency compared to temporal parallel architectures such as GPUs and multi-core CPUs. However, existing software development tools for CGRAs are limited to compiling domain specific programs to processing elements with uniform structures, and are not effective on complex micro architectures where latencies of memory access vary in a nontrivial fashion depending on data locality. A primary contribution of this paper is to provide a general algorithm that can compile general dataflow graphs, and can efficiently utilize processing elements with rich micro-architectural features such as complex instructions, multi-precision data paths, local memories, register files, switches etc. Another contribution is a uniquely innovative application of Boolean Satisfiability to formally solve this complex, and irregular optimization problem and produce high-quality results comparable to hand-written assembly code produced by human experts. A third contribution is an adaptive windowing algorithm that harnesses the complexity of the SAT-based approach and delivers a scalable and robust solution.