基于实时负载延迟跟踪的高效指令调度

IF 2 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems Pub Date : 2021-09-07 DOI:10.1145/3548681

Andreas Diavastos, Trevor E. Carlson

{"title":"基于实时负载延迟跟踪的高效指令调度","authors":"Andreas Diavastos, Trevor E. Carlson","doi":"10.1145/3548681","DOIUrl":null,"url":null,"abstract":"Issue time prediction processors use dataflow dependencies and predefined instruction latencies to predict issue times of repeated instructions. In this work, we make two key observations: (1) memory accesses often take additional time to arrive than the static, predefined access latency that is used to describe these systems. This is due to contention in the memory hierarchy and variability in DRAM access times, and (2) we find that these memory access delays often repeat across iterations of the same code. We propose a new processor microarchitecture that replaces a complex reservation-station-based scheduler with an efficient, scalable alternative. Our scheduling technique tracks real-time delays of loads to accurately predict instruction issue times and uses a reordering mechanism to prioritize instructions based on that prediction. To accomplish this in an energy-efficient manner we introduce (1) an instruction delay learning mechanism that monitors repeated load instructions and learns their latest delay, (2) an issue time predictor that uses learned delays and dataflow dependencies to predict instruction issue times, and (3) priority queues that reorder instructions based on their issue time prediction. Our processor achieves 86.2% of the performance of a traditional out-of-order processor, higher than previous efficient scheduler proposals, while consuming 30% less power.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"40 1","pages":"1 - 21"},"PeriodicalIF":2.0000,"publicationDate":"2021-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Efficient Instruction Scheduling Using Real-time Load Delay Tracking\",\"authors\":\"Andreas Diavastos, Trevor E. Carlson\",\"doi\":\"10.1145/3548681\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Issue time prediction processors use dataflow dependencies and predefined instruction latencies to predict issue times of repeated instructions. In this work, we make two key observations: (1) memory accesses often take additional time to arrive than the static, predefined access latency that is used to describe these systems. This is due to contention in the memory hierarchy and variability in DRAM access times, and (2) we find that these memory access delays often repeat across iterations of the same code. We propose a new processor microarchitecture that replaces a complex reservation-station-based scheduler with an efficient, scalable alternative. Our scheduling technique tracks real-time delays of loads to accurately predict instruction issue times and uses a reordering mechanism to prioritize instructions based on that prediction. To accomplish this in an energy-efficient manner we introduce (1) an instruction delay learning mechanism that monitors repeated load instructions and learns their latest delay, (2) an issue time predictor that uses learned delays and dataflow dependencies to predict instruction issue times, and (3) priority queues that reorder instructions based on their issue time prediction. Our processor achieves 86.2% of the performance of a traditional out-of-order processor, higher than previous efficient scheduler proposals, while consuming 30% less power.\",\"PeriodicalId\":50918,\"journal\":{\"name\":\"ACM Transactions on Computer Systems\",\"volume\":\"40 1\",\"pages\":\"1 - 21\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2021-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Computer Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3548681\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3548681","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 2

摘要

发布时间预测处理器使用数据流相关性和预定义的指令延迟来预测重复指令的发布时间。在这项工作中，我们提出了两个关键的观察结果：（1）与用于描述这些系统的静态预定义访问延迟相比，内存访问通常需要额外的时间才能到达。这是由于内存层次结构中的争用和DRAM访问时间的可变性，以及（2）我们发现这些内存访问延迟经常在同一代码的迭代中重复。我们提出了一种新的处理器微体系结构，用一种高效、可扩展的替代方案取代了复杂的基于预留站的调度器。我们的调度技术跟踪负载的实时延迟，以准确预测指令发布时间，并使用重新排序机制根据该预测对指令进行优先级排序。为了以高效节能的方式实现这一点，我们引入了（1）一种指令延迟学习机制，该机制监测重复的加载指令并学习它们的最新延迟，（2）一种发布时间预测器，该预测器使用学习的延迟和数据流依赖性来预测指令发布时间，以及（3）优先级队列，该队列基于指令的发布时间预测对指令进行重新排序。我们的处理器实现了传统无序处理器86.2%的性能，高于以前的高效调度方案，同时功耗降低了30%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Instruction Scheduling Using Real-time Load Delay Tracking

Issue time prediction processors use dataflow dependencies and predefined instruction latencies to predict issue times of repeated instructions. In this work, we make two key observations: (1) memory accesses often take additional time to arrive than the static, predefined access latency that is used to describe these systems. This is due to contention in the memory hierarchy and variability in DRAM access times, and (2) we find that these memory access delays often repeat across iterations of the same code. We propose a new processor microarchitecture that replaces a complex reservation-station-based scheduler with an efficient, scalable alternative. Our scheduling technique tracks real-time delays of loads to accurately predict instruction issue times and uses a reordering mechanism to prioritize instructions based on that prediction. To accomplish this in an energy-efficient manner we introduce (1) an instruction delay learning mechanism that monitors repeated load instructions and learns their latest delay, (2) an issue time predictor that uses learned delays and dataflow dependencies to predict instruction issue times, and (3) priority queues that reorder instructions based on their issue time prediction. Our processor achieves 86.2% of the performance of a traditional out-of-order processor, higher than previous efficient scheduler proposals, while consuming 30% less power.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Computer Systems 工程技术-计算机：理论方法

CiteScore

4.00

自引率

0.00%

发文量

审稿时长

1 months

期刊介绍： ACM Transactions on Computer Systems (TOCS) presents research and development results on the design, implementation, analysis, evaluation, and use of computer systems and systems software. The term "computer systems" is interpreted broadly and includes operating systems, systems architecture and hardware, distributed systems, optimizing compilers, and the interaction between systems and computer networks. Articles appearing in TOCS will tend either to present new techniques and concepts, or to report on experiences and experiments with actual systems. Insights useful to system designers, builders, and users will be emphasized. TOCS publishes research and technical papers, both short and long. It includes technical correspondence to permit commentary on technical topics and on previously published papers.