Synchroscalar: a multiple clock domain, power-aware, tile-based embedded processor

J. Oliver, Ravishankar Rao, P. Sultana, Jedidiah R. Crandall, E. Czernikowski, IV LeslieW.Jones, D. Franklin, V. Akella, F. Chong
{"title":"Synchroscalar: a multiple clock domain, power-aware, tile-based embedded processor","authors":"J. Oliver, Ravishankar Rao, P. Sultana, Jedidiah R. Crandall, E. Czernikowski, IV LeslieW.Jones, D. Franklin, V. Akella, F. Chong","doi":"10.1109/isca.2004.1310771","DOIUrl":null,"url":null,"abstract":"We present Synchroscalar, a tile-based architecture for embedded processing that is designed to provide the flexibility of DSPs while approaching the power efficiency of ASICs. We achieve this goal by providing high parallelism and voltage scaling while minimizing control and communication costs. Specifically, Synchroscalar uses columns of processor tiles organized into statically-assigned frequency-voltage domains to minimize power consumption. Furthermore, while columns use SIMD control to minimize overhead, data-dependent computations can be supported by extremely flexible statically-scheduled communication between columns. We provide a detailed evaluation of Synchroscalar including SPICE simulation, wire and device models, synthesis of key components, cycle-level simulation, and compiler- and hand-optimized signal processing applications. We find that the goal of meeting, not exceeding, performance targets with data-parallel applications leads to designs that depart significantly from our intuitions derived from general-purpose microprocessor design. In particular, synchronous design and substantial global interconnect are desirable in the low-frequency, low-power domain. This global interconnect supports parallelization and reduces processor idle time, which are critical to energy efficient implementations of high bandwidth signal processing. Overall, Synchroscalar provides programmability while achieving power efficiencies within 8-30/spl times/ of known ASIC implementations, which is 10-60/spl times/ better than conventional DSPs. In addition, frequency-voltage scaling in Synchroscalar provides between 3-32% power savings in our application suite.","PeriodicalId":268352,"journal":{"name":"Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"63","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 31st Annual International Symposium on Computer Architecture, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/isca.2004.1310771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 63

Abstract

We present Synchroscalar, a tile-based architecture for embedded processing that is designed to provide the flexibility of DSPs while approaching the power efficiency of ASICs. We achieve this goal by providing high parallelism and voltage scaling while minimizing control and communication costs. Specifically, Synchroscalar uses columns of processor tiles organized into statically-assigned frequency-voltage domains to minimize power consumption. Furthermore, while columns use SIMD control to minimize overhead, data-dependent computations can be supported by extremely flexible statically-scheduled communication between columns. We provide a detailed evaluation of Synchroscalar including SPICE simulation, wire and device models, synthesis of key components, cycle-level simulation, and compiler- and hand-optimized signal processing applications. We find that the goal of meeting, not exceeding, performance targets with data-parallel applications leads to designs that depart significantly from our intuitions derived from general-purpose microprocessor design. In particular, synchronous design and substantial global interconnect are desirable in the low-frequency, low-power domain. This global interconnect supports parallelization and reduces processor idle time, which are critical to energy efficient implementations of high bandwidth signal processing. Overall, Synchroscalar provides programmability while achieving power efficiencies within 8-30/spl times/ of known ASIC implementations, which is 10-60/spl times/ better than conventional DSPs. In addition, frequency-voltage scaling in Synchroscalar provides between 3-32% power savings in our application suite.
同步标量:一个多时钟域,功率感知,基于磁贴的嵌入式处理器
我们提出了Synchroscalar,这是一种基于瓷砖的嵌入式处理架构,旨在提供dsp的灵活性,同时接近asic的功率效率。我们通过提供高并行性和电压缩放来实现这一目标,同时最大限度地降低控制和通信成本。具体来说,Synchroscalar使用将处理器块组织成静态分配的频率-电压域的列,以最小化功耗。此外,虽然列使用SIMD控件来最小化开销,但可以通过列之间极其灵活的静态调度通信来支持依赖数据的计算。我们提供了Synchroscalar的详细评估,包括SPICE仿真、电线和器件模型、关键组件的合成、周期级仿真以及编译器和手动优化的信号处理应用。我们发现,通过数据并行应用程序满足而不是超过性能目标的目标导致设计明显偏离我们从通用微处理器设计中获得的直觉。特别是,在低频、低功耗领域,需要同步设计和大量的全局互连。这种全局互连支持并行化并减少处理器空闲时间,这对于高带宽信号处理的节能实现至关重要。总体而言,Synchroscalar提供可编程性,同时实现功率效率在已知ASIC实现的8-30/spl倍内,比传统dsp好10-60/spl倍。此外,Synchroscalar中的频率电压缩放在我们的应用套件中提供3-32%的功耗节省。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信