在保持可编程性的同时推动加速器效率的极限

Tony Nowatzki, Vinay Gangadhar, K. Sankaralingam, G. Wright
{"title":"在保持可编程性的同时推动加速器效率的极限","authors":"Tony Nowatzki, Vinay Gangadhar, K. Sankaralingam, G. Wright","doi":"10.1109/HPCA.2016.7446051","DOIUrl":null,"url":null,"abstract":"The waning benefits of device scaling have caused a push towards domain specific accelerators (DSAs), which sacrifice programmability for efficiency. While providing huge benefits, DSAs are prone to obsoletion due to domain volatility, have recurring design and verification costs, and have large area footprints when multiple DSAs are required in a single device. Because of the benefits of generality, this work explores how far a programmable architecture can be pushed, and whether it can come close to the performance, energy, and area efficiency of a DSA-based approach. Our insight is that DSAs employ common specialization principles for concurrency, computation, communication, data-reuse and coordination, and that these same principles can be exploited in a programmable architecture using a composition of known microarchitectural mechanisms. Specifically, we propose and study an architecture called LSSD, which is composed of many low-power and tiny cores, each having a configurable spatial architecture, scratchpads, and DMA. Our results show that a programmable, specialized architecture can indeed be competitive with a domain-specific approach. Compared to four prominent and diverse DSAs, LSSD can match the DSAs' 10× to 150× speedup over an OOO core, with only up to 4× more area and power than a single DSA, while retaining programmability.","PeriodicalId":417994,"journal":{"name":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Pushing the limits of accelerator efficiency while retaining programmability\",\"authors\":\"Tony Nowatzki, Vinay Gangadhar, K. Sankaralingam, G. Wright\",\"doi\":\"10.1109/HPCA.2016.7446051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The waning benefits of device scaling have caused a push towards domain specific accelerators (DSAs), which sacrifice programmability for efficiency. While providing huge benefits, DSAs are prone to obsoletion due to domain volatility, have recurring design and verification costs, and have large area footprints when multiple DSAs are required in a single device. Because of the benefits of generality, this work explores how far a programmable architecture can be pushed, and whether it can come close to the performance, energy, and area efficiency of a DSA-based approach. Our insight is that DSAs employ common specialization principles for concurrency, computation, communication, data-reuse and coordination, and that these same principles can be exploited in a programmable architecture using a composition of known microarchitectural mechanisms. Specifically, we propose and study an architecture called LSSD, which is composed of many low-power and tiny cores, each having a configurable spatial architecture, scratchpads, and DMA. Our results show that a programmable, specialized architecture can indeed be competitive with a domain-specific approach. Compared to four prominent and diverse DSAs, LSSD can match the DSAs' 10× to 150× speedup over an OOO core, with only up to 4× more area and power than a single DSA, while retaining programmability.\",\"PeriodicalId\":417994,\"journal\":{\"name\":\"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2016.7446051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2016.7446051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

摘要

设备可扩展性的优势逐渐减弱,导致了对特定领域加速器(dsa)的推动,这种加速器牺牲了可编程性以提高效率。虽然dsa提供了巨大的好处,但由于域的波动性,dsa容易过时,具有重复的设计和验证成本,并且在单个设备中需要多个dsa时占地面积很大。由于通用性的好处,这项工作探讨了可编程架构可以推进到什么程度,以及它是否可以接近基于dsa的方法的性能、能量和面积效率。我们的见解是,dsa在并发性、计算、通信、数据重用和协调方面采用通用的专门化原则,并且这些相同的原则可以在使用已知微体系结构机制组合的可编程体系结构中得到利用。具体来说,我们提出并研究了一种称为LSSD的架构,它由许多低功耗和微小的内核组成,每个内核都具有可配置的空间架构、刮擦板和DMA。我们的结果表明,可编程的、专门的体系结构确实可以与特定于领域的方法竞争。与四个突出的多样化DSA相比,LSSD可以在OOO内核上匹配DSA的10倍到150倍的加速,而面积和功率仅比单个DSA多4倍,同时保留可编程性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Pushing the limits of accelerator efficiency while retaining programmability
The waning benefits of device scaling have caused a push towards domain specific accelerators (DSAs), which sacrifice programmability for efficiency. While providing huge benefits, DSAs are prone to obsoletion due to domain volatility, have recurring design and verification costs, and have large area footprints when multiple DSAs are required in a single device. Because of the benefits of generality, this work explores how far a programmable architecture can be pushed, and whether it can come close to the performance, energy, and area efficiency of a DSA-based approach. Our insight is that DSAs employ common specialization principles for concurrency, computation, communication, data-reuse and coordination, and that these same principles can be exploited in a programmable architecture using a composition of known microarchitectural mechanisms. Specifically, we propose and study an architecture called LSSD, which is composed of many low-power and tiny cores, each having a configurable spatial architecture, scratchpads, and DMA. Our results show that a programmable, specialized architecture can indeed be competitive with a domain-specific approach. Compared to four prominent and diverse DSAs, LSSD can match the DSAs' 10× to 150× speedup over an OOO core, with only up to 4× more area and power than a single DSA, while retaining programmability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信