暗硅中加速器与共享存储器互连的优化

J. Cong, Bingjun Xiao
{"title":"暗硅中加速器与共享存储器互连的优化","authors":"J. Cong, Bingjun Xiao","doi":"10.1109/ICCAD.2013.6691182","DOIUrl":null,"url":null,"abstract":"Application-specific accelerators provide orders-of-magnitude improvement in energy-efficiency over CPUs, and accelerator-rich computing platforms are showing promise in the dark silicon age. Memory sharing among accelerators leads to huge transistor savings, but needs novel designs of interconnects between accelerators and shared memories. Accelerators run 100x faster than CPUs and post a high demand on data. This leads to resource-consuming interconnects if we follow the same design rules as those for interconnects between CPUs and shared memories, and simply duplicate the interconnect hardware to meet the accelerator data demand. In this work we develop a novel design of interconnects between accelerators and shared memories and exploit three optimization opportunities that emerge in accelerator-rich computing platforms: 1) The multiple data ports of the same accelerators are powered on/off together, and the competition for shared resources among these ports can be eliminated to save interconnect transistor cost; 2) In dark silicon, the number of active accelerators in an accelerator-rich platform is usually limited, and the interconnects can be partially populated to just fit the data access demand limited by the power budget; 3) The heterogeneity of accelerators leads to execution patterns among accelerators and, based on the probability analysis to identify these patterns, interconnects can be optimized for the expected utilization. Experiments show that our interconnect design outperforms prior work that was optimized for CPU cores or signal routing.","PeriodicalId":278154,"journal":{"name":"2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"257 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Optimization of interconnects between accelerators and shared memories in dark silicon\",\"authors\":\"J. Cong, Bingjun Xiao\",\"doi\":\"10.1109/ICCAD.2013.6691182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Application-specific accelerators provide orders-of-magnitude improvement in energy-efficiency over CPUs, and accelerator-rich computing platforms are showing promise in the dark silicon age. Memory sharing among accelerators leads to huge transistor savings, but needs novel designs of interconnects between accelerators and shared memories. Accelerators run 100x faster than CPUs and post a high demand on data. This leads to resource-consuming interconnects if we follow the same design rules as those for interconnects between CPUs and shared memories, and simply duplicate the interconnect hardware to meet the accelerator data demand. In this work we develop a novel design of interconnects between accelerators and shared memories and exploit three optimization opportunities that emerge in accelerator-rich computing platforms: 1) The multiple data ports of the same accelerators are powered on/off together, and the competition for shared resources among these ports can be eliminated to save interconnect transistor cost; 2) In dark silicon, the number of active accelerators in an accelerator-rich platform is usually limited, and the interconnects can be partially populated to just fit the data access demand limited by the power budget; 3) The heterogeneity of accelerators leads to execution patterns among accelerators and, based on the probability analysis to identify these patterns, interconnects can be optimized for the expected utilization. Experiments show that our interconnect design outperforms prior work that was optimized for CPU cores or signal routing.\",\"PeriodicalId\":278154,\"journal\":{\"name\":\"2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)\",\"volume\":\"257 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCAD.2013.6691182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAD.2013.6691182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

特定于应用程序的加速器在能效方面比cpu有数量级的提高,而在黑暗的硅时代,拥有丰富加速器的计算平台正展现出希望。加速器之间的内存共享可以节省大量晶体管,但需要在加速器和共享存储器之间设计新颖的互连。加速器的运行速度比cpu快100倍,对数据的要求很高。如果我们遵循与cpu和共享内存之间互连相同的设计规则,并简单地复制互连硬件以满足加速器数据需求,则会导致资源消耗互连。在这项工作中,我们开发了一种新颖的加速器和共享存储器之间的互连设计,并利用了富加速器计算平台中出现的三个优化机会:1)相同加速器的多个数据端口一起上电/关闭,可以消除这些端口之间对共享资源的竞争,从而节省互连晶体管成本;2)在暗硅中,在一个富含加速器的平台中,有源加速器的数量通常是有限的,并且可以部分填充互连以满足受功率预算限制的数据访问需求;3)加速器的异质性导致了加速器之间的执行模式,通过概率分析识别这些模式,可以优化互连以达到预期的利用率。实验表明,我们的互连设计优于先前针对CPU内核或信号路由进行优化的工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimization of interconnects between accelerators and shared memories in dark silicon
Application-specific accelerators provide orders-of-magnitude improvement in energy-efficiency over CPUs, and accelerator-rich computing platforms are showing promise in the dark silicon age. Memory sharing among accelerators leads to huge transistor savings, but needs novel designs of interconnects between accelerators and shared memories. Accelerators run 100x faster than CPUs and post a high demand on data. This leads to resource-consuming interconnects if we follow the same design rules as those for interconnects between CPUs and shared memories, and simply duplicate the interconnect hardware to meet the accelerator data demand. In this work we develop a novel design of interconnects between accelerators and shared memories and exploit three optimization opportunities that emerge in accelerator-rich computing platforms: 1) The multiple data ports of the same accelerators are powered on/off together, and the competition for shared resources among these ports can be eliminated to save interconnect transistor cost; 2) In dark silicon, the number of active accelerators in an accelerator-rich platform is usually limited, and the interconnects can be partially populated to just fit the data access demand limited by the power budget; 3) The heterogeneity of accelerators leads to execution patterns among accelerators and, based on the probability analysis to identify these patterns, interconnects can be optimized for the expected utilization. Experiments show that our interconnect design outperforms prior work that was optimized for CPU cores or signal routing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信