Rick Bahr, Clark W. Barrett, Nikhil Bhagdikar, Alex Carsello, Ross G. Daly, Caleb Donovick, David Durst, K. Fatahalian, Kathleen Feng, P. Hanrahan, Teguh Hofstee, M. Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, J. Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, S. Richardson, Rajsekhar Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James J. Thomas, Christopher Torng, Lenny Truong, Nestan Tsiskaridze, Keyi Zhang
{"title":"Creating an Agile Hardware Design Flow","authors":"Rick Bahr, Clark W. Barrett, Nikhil Bhagdikar, Alex Carsello, Ross G. Daly, Caleb Donovick, David Durst, K. Fatahalian, Kathleen Feng, P. Hanrahan, Teguh Hofstee, M. Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, J. Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, S. Richardson, Rajsekhar Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James J. Thomas, Christopher Torng, Lenny Truong, Nestan Tsiskaridze, Keyi Zhang","doi":"10.1109/DAC18072.2020.9218553","DOIUrl":null,"url":null,"abstract":"Although an agile approach is standard for software design, how to properly adapt this method to hardware is still an open question. This work addresses this question while building a system on chip (SoC) with specialized accelerators. Rather than using a traditional waterfall design flow, which starts by studying the application to be accelerated, we begin by constructing a complete flow from an application expressed in a high-level domain-specific language (DSL), in our case Halide, to a generic coarse-grained reconfigurable array (CGRA). As our under-standing of the application grows, the CGRA design evolves, and we have developed a suite of tools that tune application code, the compiler, and the CGRA to increase the efficiency of the resulting implementation. To meet our continued need to update parts of the system while maintaining the end-to-end flow, we have created DSL-based hardware generators that not only provide the Verilog needed for the implementation of the CGRA, but also create the collateral that the compiler/mapper/place and route system needs to configure its operation. This work provides a systematic approach for desiging and evolving high-performance and energy-efficient hardware-software systems for any application domain.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 57th ACM/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAC18072.2020.9218553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
Although an agile approach is standard for software design, how to properly adapt this method to hardware is still an open question. This work addresses this question while building a system on chip (SoC) with specialized accelerators. Rather than using a traditional waterfall design flow, which starts by studying the application to be accelerated, we begin by constructing a complete flow from an application expressed in a high-level domain-specific language (DSL), in our case Halide, to a generic coarse-grained reconfigurable array (CGRA). As our under-standing of the application grows, the CGRA design evolves, and we have developed a suite of tools that tune application code, the compiler, and the CGRA to increase the efficiency of the resulting implementation. To meet our continued need to update parts of the system while maintaining the end-to-end flow, we have created DSL-based hardware generators that not only provide the Verilog needed for the implementation of the CGRA, but also create the collateral that the compiler/mapper/place and route system needs to configure its operation. This work provides a systematic approach for desiging and evolving high-performance and energy-efficient hardware-software systems for any application domain.