通过缓存计划组来利用处理器中的指令级并行性

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI:10.1145/264107.264125

R. Nair, Martin E. Hopkins

{"title":"通过缓存计划组来利用处理器中的指令级并行性","authors":"R. Nair, Martin E. Hopkins","doi":"10.1145/264107.264125","DOIUrl":null,"url":null,"abstract":"Modern processors employ a large amount of hardware to dynamically detect parallelism in single-threaded programs and maintain the sequential semantics implied by these programs. The complexity of some of this hardware diminishes the gains due to parallelism because of longer clock period or increased pipeline latency of the machine.In this paper we propose a processor implementation which dynamically schedules groups of instructions while executing them on a fast simple engine and caches them for repeated execution on a fast VLIW-type engine. Our experiments show that scheduling groups spanning several basic blocks and caching these scheduled groups results in significant performance gain over fill buffer approaches for a standard VLIW cache.This concept, which we call DIF (Dynamic Instruction Formatting), unifies and extends principles underlying several schemes being proposed today to reduce superscalar processor complexity. This paper examines various issues in designing such a processor and presents results of experiments using trace-driven simulation of SPECint95 benchmark programs.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"106","resultStr":"{\"title\":\"Exploiting Instruction Level Parallelism In Processors By Caching Scheduled Groups\",\"authors\":\"R. Nair, Martin E. Hopkins\",\"doi\":\"10.1145/264107.264125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern processors employ a large amount of hardware to dynamically detect parallelism in single-threaded programs and maintain the sequential semantics implied by these programs. The complexity of some of this hardware diminishes the gains due to parallelism because of longer clock period or increased pipeline latency of the machine.In this paper we propose a processor implementation which dynamically schedules groups of instructions while executing them on a fast simple engine and caches them for repeated execution on a fast VLIW-type engine. Our experiments show that scheduling groups spanning several basic blocks and caching these scheduled groups results in significant performance gain over fill buffer approaches for a standard VLIW cache.This concept, which we call DIF (Dynamic Instruction Formatting), unifies and extends principles underlying several schemes being proposed today to reduce superscalar processor complexity. This paper examines various issues in designing such a processor and presents results of experiments using trace-driven simulation of SPECint95 benchmark programs.\",\"PeriodicalId\":405506,\"journal\":{\"name\":\"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"106\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/264107.264125\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/264107.264125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 106

摘要

现代处理器使用大量硬件来动态检测单线程程序中的并行性，并维护这些程序所隐含的顺序语义。由于较长的时钟周期或机器的管道延迟增加，某些硬件的复杂性降低了并行性带来的增益。在本文中，我们提出了一种处理器实现，该处理器在快速简单引擎上执行指令时动态调度指令组，并将其缓存以便在快速vliw型引擎上重复执行。我们的实验表明，与标准VLIW缓存的填充缓冲方法相比，跨几个基本块调度组并缓存这些调度组可以显著提高性能。这个概念，我们称之为DIF(动态指令格式化)，统一并扩展了目前提出的几种方案的基本原则，以降低超标量处理器的复杂性。本文探讨了设计这样一个处理器的各种问题，并介绍了使用SPECint95基准程序的跟踪驱动仿真的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploiting Instruction Level Parallelism In Processors By Caching Scheduled Groups

Modern processors employ a large amount of hardware to dynamically detect parallelism in single-threaded programs and maintain the sequential semantics implied by these programs. The complexity of some of this hardware diminishes the gains due to parallelism because of longer clock period or increased pipeline latency of the machine.In this paper we propose a processor implementation which dynamically schedules groups of instructions while executing them on a fast simple engine and caches them for repeated execution on a fast VLIW-type engine. Our experiments show that scheduling groups spanning several basic blocks and caching these scheduled groups results in significant performance gain over fill buffer approaches for a standard VLIW cache.This concept, which we call DIF (Dynamic Instruction Formatting), unifies and extends principles underlying several schemes being proposed today to reduce superscalar processor complexity. This paper examines various issues in designing such a processor and presents results of experiments using trace-driven simulation of SPECint95 benchmark programs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量