Power and performance aware memory-controller voting mechanism

2018 19th International Symposium on Quality Electronic Design (ISQED) Pub Date : 2018-03-13 DOI:10.1109/ISQED.2018.8357276

M. Vratonjic, H. Singh, G. Kumar, R. Mohamed, Ashish Bajaj, Ken Gainey

{"title":"Power and performance aware memory-controller voting mechanism","authors":"M. Vratonjic, H. Singh, G. Kumar, R. Mohamed, Ashish Bajaj, Ken Gainey","doi":"10.1109/ISQED.2018.8357276","DOIUrl":null,"url":null,"abstract":"Modern System-on-Chips (SoCs) integrate a graphics unit (GPU) with many application processor cores (CPUs), communication cores (modem, WiFi) and device interfaces (USB, HDMI) on a single die. The primary memory system is fast becoming a major performance bottleneck as more and more of these units share this critical resource. An Integrated-Memory-Controller (IMC) is responsible for buffering and servicing memory requests from different CPU cores, GPU and other processing blocks that require DDR memory access. Previous work [2] was focused on appropriately prioritizing memory requests and increasing IMC/DDR memory frequency to improve system performance — which came at the expense of higher power consumption. Recent work has addressed this problem by using a demand based approach. This is accomplished by making the IMC aware of the application characteristics and then scaling its frequency based on the memory access demand [1]. This leads to lower IMC and DDR frequencies and thus lower power. The work presented here shows that instead of lowering the frequency, greater total system power savings can be achieved by increasing IMC frequency at the beginning of a use-case that has moderate GPU utilization. The primary motivation behind this approach is that it allows GPU, with its inherent ability to execute a larger number of parallel threads, to access memory faster and therefore complete its processing portion of the execution pipeline faster. This, in turn, allows relaxation of the timing requirements imposed on the CPU pipeline portion and consecutive cycles, thus saving on total system power. An algorithm for this technique, along with the silicon results on an SoC implemented in an industrial 28nm process, will be presented in this paper.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 19th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED.2018.8357276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Modern System-on-Chips (SoCs) integrate a graphics unit (GPU) with many application processor cores (CPUs), communication cores (modem, WiFi) and device interfaces (USB, HDMI) on a single die. The primary memory system is fast becoming a major performance bottleneck as more and more of these units share this critical resource. An Integrated-Memory-Controller (IMC) is responsible for buffering and servicing memory requests from different CPU cores, GPU and other processing blocks that require DDR memory access. Previous work [2] was focused on appropriately prioritizing memory requests and increasing IMC/DDR memory frequency to improve system performance — which came at the expense of higher power consumption. Recent work has addressed this problem by using a demand based approach. This is accomplished by making the IMC aware of the application characteristics and then scaling its frequency based on the memory access demand [1]. This leads to lower IMC and DDR frequencies and thus lower power. The work presented here shows that instead of lowering the frequency, greater total system power savings can be achieved by increasing IMC frequency at the beginning of a use-case that has moderate GPU utilization. The primary motivation behind this approach is that it allows GPU, with its inherent ability to execute a larger number of parallel threads, to access memory faster and therefore complete its processing portion of the execution pipeline faster. This, in turn, allows relaxation of the timing requirements imposed on the CPU pipeline portion and consecutive cycles, thus saving on total system power. An algorithm for this technique, along with the silicon results on an SoC implemented in an industrial 28nm process, will be presented in this paper.

查看原文本刊更多论文

功耗和性能敏感的内存控制器投票机制

现代片上系统(soc)将图形单元(GPU)与许多应用处理器核心(cpu)，通信核心(调制解调器，WiFi)和设备接口(USB, HDMI)集成在单个芯片上。随着越来越多的主存单元共享这一关键资源，主存系统正迅速成为主要的性能瓶颈。集成内存控制器(IMC)负责缓冲和服务来自不同CPU核心、GPU和其他需要DDR内存访问的处理块的内存请求。以前的工作[2]主要关注内存请求的适当优先级和提高IMC/DDR内存频率以提高系统性能——这是以更高的功耗为代价的。最近的工作通过使用基于需求的方法解决了这个问题。这是通过让IMC了解应用程序的特征，然后根据内存访问需求调整其频率来实现的[1]。这将导致较低的IMC和DDR频率，从而降低功率。这里展示的工作表明，在具有中等GPU利用率的用例开始时，通过增加IMC频率可以实现更大的系统总功耗节省，而不是降低频率。这种方法背后的主要动机是，它允许GPU以其固有的能力来执行更多的并行线程，更快地访问内存，从而更快地完成执行管道的处理部分。反过来，这允许放松对CPU管道部分和连续周期施加的时间要求，从而节省系统总功率。本文将介绍该技术的算法，以及在工业28nm工艺中实现的SoC上的硅结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 19th International Symposium on Quality Electronic Design (ISQED)

自引率

0.00%

发文量