Power and performance aware memory-controller voting mechanism

M. Vratonjic, H. Singh, G. Kumar, R. Mohamed, Ashish Bajaj, Ken Gainey
{"title":"Power and performance aware memory-controller voting mechanism","authors":"M. Vratonjic, H. Singh, G. Kumar, R. Mohamed, Ashish Bajaj, Ken Gainey","doi":"10.1109/ISQED.2018.8357276","DOIUrl":null,"url":null,"abstract":"Modern System-on-Chips (SoCs) integrate a graphics unit (GPU) with many application processor cores (CPUs), communication cores (modem, WiFi) and device interfaces (USB, HDMI) on a single die. The primary memory system is fast becoming a major performance bottleneck as more and more of these units share this critical resource. An Integrated-Memory-Controller (IMC) is responsible for buffering and servicing memory requests from different CPU cores, GPU and other processing blocks that require DDR memory access. Previous work [2] was focused on appropriately prioritizing memory requests and increasing IMC/DDR memory frequency to improve system performance — which came at the expense of higher power consumption. Recent work has addressed this problem by using a demand based approach. This is accomplished by making the IMC aware of the application characteristics and then scaling its frequency based on the memory access demand [1]. This leads to lower IMC and DDR frequencies and thus lower power. The work presented here shows that instead of lowering the frequency, greater total system power savings can be achieved by increasing IMC frequency at the beginning of a use-case that has moderate GPU utilization. The primary motivation behind this approach is that it allows GPU, with its inherent ability to execute a larger number of parallel threads, to access memory faster and therefore complete its processing portion of the execution pipeline faster. This, in turn, allows relaxation of the timing requirements imposed on the CPU pipeline portion and consecutive cycles, thus saving on total system power. An algorithm for this technique, along with the silicon results on an SoC implemented in an industrial 28nm process, will be presented in this paper.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 19th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED.2018.8357276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Modern System-on-Chips (SoCs) integrate a graphics unit (GPU) with many application processor cores (CPUs), communication cores (modem, WiFi) and device interfaces (USB, HDMI) on a single die. The primary memory system is fast becoming a major performance bottleneck as more and more of these units share this critical resource. An Integrated-Memory-Controller (IMC) is responsible for buffering and servicing memory requests from different CPU cores, GPU and other processing blocks that require DDR memory access. Previous work [2] was focused on appropriately prioritizing memory requests and increasing IMC/DDR memory frequency to improve system performance — which came at the expense of higher power consumption. Recent work has addressed this problem by using a demand based approach. This is accomplished by making the IMC aware of the application characteristics and then scaling its frequency based on the memory access demand [1]. This leads to lower IMC and DDR frequencies and thus lower power. The work presented here shows that instead of lowering the frequency, greater total system power savings can be achieved by increasing IMC frequency at the beginning of a use-case that has moderate GPU utilization. The primary motivation behind this approach is that it allows GPU, with its inherent ability to execute a larger number of parallel threads, to access memory faster and therefore complete its processing portion of the execution pipeline faster. This, in turn, allows relaxation of the timing requirements imposed on the CPU pipeline portion and consecutive cycles, thus saving on total system power. An algorithm for this technique, along with the silicon results on an SoC implemented in an industrial 28nm process, will be presented in this paper.
功耗和性能敏感的内存控制器投票机制
现代片上系统(soc)将图形单元(GPU)与许多应用处理器核心(cpu),通信核心(调制解调器,WiFi)和设备接口(USB, HDMI)集成在单个芯片上。随着越来越多的主存单元共享这一关键资源,主存系统正迅速成为主要的性能瓶颈。集成内存控制器(IMC)负责缓冲和服务来自不同CPU核心、GPU和其他需要DDR内存访问的处理块的内存请求。以前的工作[2]主要关注内存请求的适当优先级和提高IMC/DDR内存频率以提高系统性能——这是以更高的功耗为代价的。最近的工作通过使用基于需求的方法解决了这个问题。这是通过让IMC了解应用程序的特征,然后根据内存访问需求调整其频率来实现的[1]。这将导致较低的IMC和DDR频率,从而降低功率。这里展示的工作表明,在具有中等GPU利用率的用例开始时,通过增加IMC频率可以实现更大的系统总功耗节省,而不是降低频率。这种方法背后的主要动机是,它允许GPU以其固有的能力来执行更多的并行线程,更快地访问内存,从而更快地完成执行管道的处理部分。反过来,这允许放松对CPU管道部分和连续周期施加的时间要求,从而节省系统总功率。本文将介绍该技术的算法,以及在工业28nm工艺中实现的SoC上的硅结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信