利用以数据为中心的子图映射的窄加速器

International Symposium on Code Generation and Optimization (CGO'07) Pub Date : 2007-03-11 DOI:10.1109/CGO.2007.11

Amir Hormati, Nathan Clark, S. Mahlke

{"title":"利用以数据为中心的子图映射的窄加速器","authors":"Amir Hormati, Nathan Clark, S. Mahlke","doi":"10.1109/CGO.2007.11","DOIUrl":null,"url":null,"abstract":"The demand for high performance has driven acyclic computation accelerators into extensive use in modern embedded and desktop architectures. Accelerators that are ideal from a software perspective, are difficult or impossible to integrate in many modern architectures, though, due to area and timing requirements. This reality is coupled with the observation that many application domains under-utilize accelerator hardware, because of the narrow data they operate on and the nature of their computation. In this work, we take advantage of these facts to design accelerators capable of executing in modern architectures by narrowing datapath width and reducing interconnect. Novel compiler techniques are developed in order to generate high-quality code for the reduced-cost accelerators and prevent performance loss to the extent possible. First, data width profiling is used to statistically determine how wide program data will be at run time. This information is used by the subgraph mapping algorithm to optimally select subgraphs for execution on targeted narrow accelerators. Overall, our data-centric compilation techniques achieve on average 6.5%, and up to 12%, speed up over previous subgraph mapping algorithms for 8-bit accelerators. We also show that, with appropriate compiler support, the increase in the total number of execution cycles in reduced-interconnect accelerators is less than 1% of the fully-connected accelerator","PeriodicalId":244171,"journal":{"name":"International Symposium on Code Generation and Optimization (CGO'07)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping\",\"authors\":\"Amir Hormati, Nathan Clark, S. Mahlke\",\"doi\":\"10.1109/CGO.2007.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The demand for high performance has driven acyclic computation accelerators into extensive use in modern embedded and desktop architectures. Accelerators that are ideal from a software perspective, are difficult or impossible to integrate in many modern architectures, though, due to area and timing requirements. This reality is coupled with the observation that many application domains under-utilize accelerator hardware, because of the narrow data they operate on and the nature of their computation. In this work, we take advantage of these facts to design accelerators capable of executing in modern architectures by narrowing datapath width and reducing interconnect. Novel compiler techniques are developed in order to generate high-quality code for the reduced-cost accelerators and prevent performance loss to the extent possible. First, data width profiling is used to statistically determine how wide program data will be at run time. This information is used by the subgraph mapping algorithm to optimally select subgraphs for execution on targeted narrow accelerators. Overall, our data-centric compilation techniques achieve on average 6.5%, and up to 12%, speed up over previous subgraph mapping algorithms for 8-bit accelerators. We also show that, with appropriate compiler support, the increase in the total number of execution cycles in reduced-interconnect accelerators is less than 1% of the fully-connected accelerator\",\"PeriodicalId\":244171,\"journal\":{\"name\":\"International Symposium on Code Generation and Optimization (CGO'07)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-03-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium on Code Generation and Optimization (CGO'07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CGO.2007.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Code Generation and Optimization (CGO'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO.2007.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

对高性能的需求推动了非循环计算加速器在现代嵌入式和桌面架构中的广泛应用。从软件的角度来看，加速器是理想的，但是由于面积和时间的要求，很难或不可能集成到许多现代架构中。与此同时，我们还观察到许多应用程序领域没有充分利用加速器硬件，因为它们操作的数据很窄，计算的性质也很复杂。在这项工作中，我们利用这些事实来设计能够在现代架构中执行的加速器，通过缩小数据路径宽度和减少互连。为了为低成本的加速器生成高质量的代码，并尽可能地防止性能损失，开发了新的编译器技术。首先，数据宽度分析用于统计地确定程序数据在运行时的宽度。子图映射算法使用该信息来最佳地选择子图，以便在目标窄加速器上执行。总的来说，我们的以数据为中心的编译技术比以前的8位加速器子图映射算法平均提高6.5%，最高提高12%。我们还表明，在适当的编译器支持下，减少互连加速器中执行周期总数的增加不到完全连接加速器的1%

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

The demand for high performance has driven acyclic computation accelerators into extensive use in modern embedded and desktop architectures. Accelerators that are ideal from a software perspective, are difficult or impossible to integrate in many modern architectures, though, due to area and timing requirements. This reality is coupled with the observation that many application domains under-utilize accelerator hardware, because of the narrow data they operate on and the nature of their computation. In this work, we take advantage of these facts to design accelerators capable of executing in modern architectures by narrowing datapath width and reducing interconnect. Novel compiler techniques are developed in order to generate high-quality code for the reduced-cost accelerators and prevent performance loss to the extent possible. First, data width profiling is used to statistically determine how wide program data will be at run time. This information is used by the subgraph mapping algorithm to optimally select subgraphs for execution on targeted narrow accelerators. Overall, our data-centric compilation techniques achieve on average 6.5%, and up to 12%, speed up over previous subgraph mapping algorithms for 8-bit accelerators. We also show that, with appropriate compiler support, the increase in the total number of execution cycles in reduced-interconnect accelerators is less than 1% of the fully-connected accelerator

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Symposium on Code Generation and Optimization (CGO'07)

自引率

0.00%

发文量