Imposing coarse-grained reconfiguration to general purpose processors

M. Duric, Milan Stanic, Ivan Ratković, Oscar Palomar, O. Unsal, A. Cristal, M. Valero, Aaron Smith
{"title":"Imposing coarse-grained reconfiguration to general purpose processors","authors":"M. Duric, Milan Stanic, Ivan Ratković, Oscar Palomar, O. Unsal, A. Cristal, M. Valero, Aaron Smith","doi":"10.1109/SAMOS.2015.7363658","DOIUrl":null,"url":null,"abstract":"Mobile devices execute applications with diverse compute and performance demands. This paper proposes a general purpose processor that adapts the underlying hardware to a given workload. Existing mobile processors need to utilize more complex heterogeneous substrates to deliver the demanded performance. They incorporate different cores and specialized accelerators. On the contrary, our processor utilizes only modest homogeneous cores and dynamically provides an execution substrate suitable to accelerate a particular workload. Instead of incorporating accelerators, the processor reconfigures one or more cores into accelerators on-the-fly. It improves performance with minimal hardware additions. The accelerators are made of general purpose ALUs reconfigured into a compute fabric and the general purpose pipeline that streams data through the fabric. To enable reconfiguration of ALUs into the fabric, the floorplan of a 4-core processor is changed to place the ALUs in close proximity on the chip. A configurable switched network is added to couple and dynamically reconfigure the ALUs to perform computation of frequently repeated regions, instead of executing general purpose instructions. Through this reconfiguration, the mobile processor specializes its substrate for a given workload and maximizes performance of the existing resources. Our results show that reconfiguration accelerates a set of selected compute intensive workloads by 1.56×, 2,39×, 3,51×, when configuring the accelerator of 1-, 2-, or 4- cores respectively.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMOS.2015.7363658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Mobile devices execute applications with diverse compute and performance demands. This paper proposes a general purpose processor that adapts the underlying hardware to a given workload. Existing mobile processors need to utilize more complex heterogeneous substrates to deliver the demanded performance. They incorporate different cores and specialized accelerators. On the contrary, our processor utilizes only modest homogeneous cores and dynamically provides an execution substrate suitable to accelerate a particular workload. Instead of incorporating accelerators, the processor reconfigures one or more cores into accelerators on-the-fly. It improves performance with minimal hardware additions. The accelerators are made of general purpose ALUs reconfigured into a compute fabric and the general purpose pipeline that streams data through the fabric. To enable reconfiguration of ALUs into the fabric, the floorplan of a 4-core processor is changed to place the ALUs in close proximity on the chip. A configurable switched network is added to couple and dynamically reconfigure the ALUs to perform computation of frequently repeated regions, instead of executing general purpose instructions. Through this reconfiguration, the mobile processor specializes its substrate for a given workload and maximizes performance of the existing resources. Our results show that reconfiguration accelerates a set of selected compute intensive workloads by 1.56×, 2,39×, 3,51×, when configuring the accelerator of 1-, 2-, or 4- cores respectively.
对通用处理器实施粗粒度的重新配置
移动设备执行具有不同计算和性能需求的应用程序。本文提出了一种通用处理器,它可以使底层硬件适应给定的工作负载。现有的移动处理器需要利用更复杂的异构基板来提供所需的性能。它们包含不同的内核和专门的加速器。相反,我们的处理器只使用适度的同构内核,并动态地提供适合加速特定工作负载的执行基板。处理器没有集成加速器,而是动态地将一个或多个核心重新配置为加速器。它以最少的硬件添加提高了性能。加速器由重新配置到计算结构中的通用alu和通过该结构传输数据的通用管道组成。为了将alu重新配置到fabric中,改变了4核处理器的平面布局,将alu放置在芯片上的距离很近的位置。增加了一个可配置的交换网络来耦合和动态地重新配置alu,以执行频繁重复区域的计算,而不是执行通用指令。通过这种重新配置,移动处理器将其基板专门用于给定的工作负载,并最大化现有资源的性能。我们的研究结果表明,当配置1核、2核或4核加速器时,重新配置可分别将一组选定的计算密集型工作负载加速1.56倍、2倍、39倍、3倍和51倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信