2011 International Conference on Parallel Architectures and Compilation Techniques最新文献_第7页

Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour 利用MRU-Tour的概念提高最后一级缓存性能

2011 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2011-10-10 DOI: 10.1109/PACT.2011.47

A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato

引用次数: 2

ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment 阿里阿德涅:断开网络环境中的不可知重构

2011 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2011-10-10 DOI: 10.1109/PACT.2011.61

K. Aisopos, A. DeOrio, L. Peh, V. Bertacco

{"title":"ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment","authors":"K. Aisopos, A. DeOrio, L. Peh, V. Bertacco","doi":"10.1109/PACT.2011.61","DOIUrl":"https://doi.org/10.1109/PACT.2011.61","url":null,"abstract":"Extreme transistor technology scaling is causing increasing concerns in device reliability: the expected lifetime of individual transistors in complex chips is quickly decreasing, and the problem is expected to worsen at future technology nodes. With complex designs increasingly relying on Networks-on-Chip (NoCs) for on-chip data transfers, a NoC must continue to operate even in the face of many transistor failures. Specifically, it must be able to reconfigure and reroute packets around faults to enable continued operation, i.e., generate new routing paths to replace the old ones upon a failure. In addition to these reliability requirements, NoCs must maintain low latency and high throughput at very low area budget. In this work, we propose a distributed reconfiguration solution named Ariadne, targeting large, aggressively scaled, unreliable NoCs. Ariadne utilizes up*/down* for fast routing at high bandwidth, and upon any number of concurrent network failures in any location, it reconfigures to discover new resilient paths to connect the surviving nodes. Experimental results show that Ariadne provides a 40%-140% latency improvement (when subject to 50 faults in a 64-node NoC) over other on-chip state-of-the-art fault tolerant solutions, while meeting the low area budget of on-chip routers with an overhead of just 1.97%.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132048710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 104

Prediction Based DRAM Row-Buffer Management in the Many-Core Era 多核时代基于预测的DRAM行缓冲管理

2011 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2011-10-10 DOI: 10.1109/PACT.2011.31

M. Awasthi, D. Nellans, R. Balasubramonian, A. Davis

{"title":"Prediction Based DRAM Row-Buffer Management in the Many-Core Era","authors":"M. Awasthi, D. Nellans, R. Balasubramonian, A. Davis","doi":"10.1109/PACT.2011.31","DOIUrl":"https://doi.org/10.1109/PACT.2011.31","url":null,"abstract":"Modern processors are experiencing interleaved memory access streams from different threads/cores, reducing the spatial locality that is seen at the memory controller, making the combined stream appear increasingly random. Traditional methods for exploiting locality at the DRAM level, such as open-page and timer-based policies, become less effective as the number of threads accessing memory increases. Employing closed-page policies in such systems can improve performance but it eliminates any possibility of exploiting locality. In this paper, we build upon the key insight that a history-based predictor that tracks the number of accesses to a given DRAM page is a much better indicator of DRAM locality than timer based policies. We extend prior work to propose a simple Access Based Predictor (ABP) that tracks limited access history at the page level to determine page closure decisions, and does so with much smaller storage overhead than previously proposed policies. We show that ABP, with additional optimizations, can improve system throughput by 12.3% and 21.6% over open and closed-page policies, respectively. The proposed ABP requires 20 KB of storage overhead and is outside the critical path of memory access.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132112109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

PEPSC: A Power-Efficient Processor for Scientific Computing PEPSC:用于科学计算的高能效处理器

2011 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2011-10-10 DOI: 10.1109/PACT.2011.16

Ganesh S. Dasika, Ankit Sethia, T. Mudge, S. Mahlke

{"title":"PEPSC: A Power-Efficient Processor for Scientific Computing","authors":"Ganesh S. Dasika, Ankit Sethia, T. Mudge, S. Mahlke","doi":"10.1109/PACT.2011.16","DOIUrl":"https://doi.org/10.1109/PACT.2011.16","url":null,"abstract":"The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost of a high-end laptop computer. While these devices have clearly changed the landscape of computing, there are two central problems that arise. First, GPUs are designed and optimized for graphics applications resulting in delivered performance that is far below peak for more general scientific and mathematical applications. Second, GPUs are power hungry devices that often consume 100-300 watts, which restricts the scalability of the solution and requires expensive cooling. To combat these challenges, this paper presents the PEPSC architecture -- an architecture customized for the domain of data parallel scientific applications where power-efficiency is the central focus. PEPSC utilizes a combination of a two-dimensional single-instruction multiple-data (SIMD) data path, an intelligent dynamic prefetching mechanism, and a configurable SIMD control approach to increase execution efficiency over conventional GPUs. A single PEPSC core has a peak performance of 120 GFLOPs while consuming 2W of power when executing modern scientific applications, which represents an increase in computation efficiency of more than 10X over existing GPUs.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128783217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

MRAC: A Memristor-based Reconfigurable Framework for Adaptive Cache Replacement MRAC:一种基于忆阻器的自适应缓存替换可重构框架

2011 International Conference on Parallel Architectures and Compilation Techniques Pub Date : 2011-10-10 DOI: 10.1109/PACT.2011.29

Ping Zhou, Bo Zhao, Youtao Zhang, Jun Yang, Yiran Chen

引用次数: 1