一个简单的延迟容忍处理器

2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI:10.1109/ICCD.2008.4751889

Satyanarayana Nekkalapu, Haitham Akkary, K. Jothi, Renjith Retnamma, Xiaoyu Song

{"title":"一个简单的延迟容忍处理器","authors":"Satyanarayana Nekkalapu, Haitham Akkary, K. Jothi, Renjith Retnamma, Xiaoyu Song","doi":"10.1109/ICCD.2008.4751889","DOIUrl":null,"url":null,"abstract":"The advent of multi-core processors and the emergence of new parallel applications that take advantage of such processors pose difficult challenges to designers. With relatively constant die sizes, limited on chip cache, and scarce pin bandwidth, more cores on chip reduces the amount of available cache and bus bandwidth per core, therefore exacerbating the memory wall problem. How can a designer build a processor that provides a core with good single-thread performance in the presence of long latency cache misses, while enabling as many of these cores to be placed on the same die for high throughput. Conventional latency tolerant architectures that use out-of-order superscalar execution have become too complex and power hungry for the multi-core era. Instead, we present a simple, non-blocking architecture that achieves memory latency tolerance without requiring complex out-of-order execution hardware or large, cycle-critical and power hungry structures, such as dynamic schedulers, fully associative load and store queues, and reorder buffers. The non-blocking property of this architecture provides tolerance to hundreds of cycles of cache miss latency on a simple in-order issue core, thus allowing many more such cores to be integrated on the same die than is possible with conventional out-of-order superscalar architecture.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"A simple latency tolerant processor\",\"authors\":\"Satyanarayana Nekkalapu, Haitham Akkary, K. Jothi, Renjith Retnamma, Xiaoyu Song\",\"doi\":\"10.1109/ICCD.2008.4751889\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The advent of multi-core processors and the emergence of new parallel applications that take advantage of such processors pose difficult challenges to designers. With relatively constant die sizes, limited on chip cache, and scarce pin bandwidth, more cores on chip reduces the amount of available cache and bus bandwidth per core, therefore exacerbating the memory wall problem. How can a designer build a processor that provides a core with good single-thread performance in the presence of long latency cache misses, while enabling as many of these cores to be placed on the same die for high throughput. Conventional latency tolerant architectures that use out-of-order superscalar execution have become too complex and power hungry for the multi-core era. Instead, we present a simple, non-blocking architecture that achieves memory latency tolerance without requiring complex out-of-order execution hardware or large, cycle-critical and power hungry structures, such as dynamic schedulers, fully associative load and store queues, and reorder buffers. The non-blocking property of this architecture provides tolerance to hundreds of cycles of cache miss latency on a simple in-order issue core, thus allowing many more such cores to be integrated on the same die than is possible with conventional out-of-order superscalar architecture.\",\"PeriodicalId\":345501,\"journal\":{\"name\":\"2008 IEEE International Conference on Computer Design\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Computer Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2008.4751889\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2008.4751889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

摘要

多核处理器的出现和利用这些处理器的新的并行应用程序的出现给设计人员带来了困难的挑战。由于芯片尺寸相对恒定，芯片上缓存有限，引脚带宽稀缺，芯片上更多的内核减少了每个内核可用的缓存和总线带宽，因此加剧了内存墙问题。设计人员如何构建一个处理器，在存在长延迟缓存丢失的情况下，提供一个具有良好单线程性能的核心，同时使尽可能多的这些核心放在同一个die上以实现高吞吐量?使用无序超标量执行的传统延迟容忍架构对于多核时代来说已经变得过于复杂和耗电。相反，我们提出了一个简单的、非阻塞的架构，它可以实现内存延迟容忍，而不需要复杂的乱序执行硬件或大型的、周期关键的和耗电的结构，如动态调度器、完全关联的负载和存储队列以及重新排序缓冲区。这种架构的非阻塞特性在一个简单的有序问题核心上提供了数百个缓存丢失延迟周期的容错性，从而允许在同一个die上集成比传统的无序超标架构更多的这样的核心。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A simple latency tolerant processor

The advent of multi-core processors and the emergence of new parallel applications that take advantage of such processors pose difficult challenges to designers. With relatively constant die sizes, limited on chip cache, and scarce pin bandwidth, more cores on chip reduces the amount of available cache and bus bandwidth per core, therefore exacerbating the memory wall problem. How can a designer build a processor that provides a core with good single-thread performance in the presence of long latency cache misses, while enabling as many of these cores to be placed on the same die for high throughput. Conventional latency tolerant architectures that use out-of-order superscalar execution have become too complex and power hungry for the multi-core era. Instead, we present a simple, non-blocking architecture that achieves memory latency tolerance without requiring complex out-of-order execution hardware or large, cycle-critical and power hungry structures, such as dynamic schedulers, fully associative load and store queues, and reorder buffers. The non-blocking property of this architecture provides tolerance to hundreds of cycles of cache miss latency on a simple in-order issue core, thus allowing many more such cores to be integrated on the same die than is possible with conventional out-of-order superscalar architecture.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Computer Design

自引率

0.00%

发文量