连续流管道

ASPLOS XI Pub Date : 2004-10-07 DOI:10.1145/1024393.1024407

Srikanth T. Srinivasan, Ravi Rajwar, Haitham Akkary, A. Gandhi, M. Upton

{"title":"连续流管道","authors":"Srikanth T. Srinivasan, Ravi Rajwar, Haitham Akkary, A. Gandhi, M. Upton","doi":"10.1145/1024393.1024407","DOIUrl":null,"url":null,"abstract":"Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides high single-thread performance and enables multiple of these to be placed on the same die for high throughput while dynamically adapting for future applications? Conventional approaches for high single-thread performance rely on large and complex cores to sustain a large instruction window for memory tolerance, making them unsuitable for multi-core chips. We present Continual Flow Pipelines (CFP) as a new non-blocking processor pipeline architecture that achieves the performance of a large instruction window without requiring cycle-critical structures such as the scheduler and register file to be large. We show that to achieve benefits of a large instruction window, inefficiencies in management of both the scheduler and register file must be addressed, and we propose a unified solution. The non-blocking property of CFP keeps key processor structures affecting cycle time and power (scheduler, register file), and die size (second level cache) small. The memory latency-tolerant CFP core allows multiple cores on a single die while outperforming current processor cores for single-thread applications.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"197","resultStr":"{\"title\":\"Continual flow pipelines\",\"authors\":\"Srikanth T. Srinivasan, Ravi Rajwar, Haitham Akkary, A. Gandhi, M. Upton\",\"doi\":\"10.1145/1024393.1024407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides high single-thread performance and enables multiple of these to be placed on the same die for high throughput while dynamically adapting for future applications? Conventional approaches for high single-thread performance rely on large and complex cores to sustain a large instruction window for memory tolerance, making them unsuitable for multi-core chips. We present Continual Flow Pipelines (CFP) as a new non-blocking processor pipeline architecture that achieves the performance of a large instruction window without requiring cycle-critical structures such as the scheduler and register file to be large. We show that to achieve benefits of a large instruction window, inefficiencies in management of both the scheduler and register file must be addressed, and we propose a unified solution. The non-blocking property of CFP keeps key processor structures affecting cycle time and power (scheduler, register file), and die size (second level cache) small. The memory latency-tolerant CFP core allows multiple cores on a single die while outperforming current processor cores for single-thread applications.\",\"PeriodicalId\":344295,\"journal\":{\"name\":\"ASPLOS XI\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"197\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ASPLOS XI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1024393.1024407\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS XI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1024393.1024407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 197

摘要

以单个芯片上多个处理器内核的形式增加的集成、相对恒定的芯片尺寸、缩小的功耗封装以及新兴应用程序为处理器架构师带来了新的挑战。如何构建提供高单线程性能的处理器，并使多个处理器能够放置在同一芯片上以实现高吞吐量，同时动态适应未来的应用程序?传统的高单线程性能方法依赖于大而复杂的内核来维持大的内存容错指令窗口，这使得它们不适合多核芯片。我们提出了连续流管道(CFP)作为一种新的非阻塞处理器管道架构，它实现了大指令窗口的性能，而不需要像调度程序和寄存器文件这样的周期关键结构。为了实现大指令窗口的优势，必须解决调度程序和寄存器文件管理效率低下的问题，并提出了统一的解决方案。CFP的非阻塞特性使影响周期时间和功耗(调度器、寄存器文件)和芯片大小(二级缓存)的关键处理器结构保持较小。内存延迟容忍CFP核心允许在单个芯片上多个核心，同时在单线程应用程序中优于当前的处理器核心。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Continual flow pipelines

Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides high single-thread performance and enables multiple of these to be placed on the same die for high throughput while dynamically adapting for future applications? Conventional approaches for high single-thread performance rely on large and complex cores to sustain a large instruction window for memory tolerance, making them unsuitable for multi-core chips. We present Continual Flow Pipelines (CFP) as a new non-blocking processor pipeline architecture that achieves the performance of a large instruction window without requiring cycle-critical structures such as the scheduler and register file to be large. We show that to achieve benefits of a large instruction window, inefficiencies in management of both the scheduler and register file must be addressed, and we propose a unified solution. The non-blocking property of CFP keeps key processor structures affecting cycle time and power (scheduler, register file), and die size (second level cache) small. The memory latency-tolerant CFP core allows multiple cores on a single die while outperforming current processor cores for single-thread applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ASPLOS XI

自引率

0.00%

发文量