A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato
{"title":"Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour","authors":"A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato","doi":"10.1109/PACT.2011.47","DOIUrl":"https://doi.org/10.1109/PACT.2011.47","url":null,"abstract":"Last-Level Caches (LLCs) implement the LRU algorithm to exploit temporal locality, but its performance is quite far of Belady's optimal algorithm as the number of ways increases. One of the main reasons because of LRU does not reach good performance in LLCs is that this policy forces a block to descend until the bottom of the stack before eviction. Nevertheless, most of the blocks that leave the MRU position are not referenced again before eviction. This work pursues to select candidate blocks to be victimized before reaching the bottom of the stack. To this end, this work defines the number of MRU-Tours (MRUTs) of a block as the number of times that a block enters in the MRU position during its live time. Based on the fact that most of the blocks exhibit a single MRUT, this work presents the family of MRUT-based algorithms aimed at exploiting this block behavior to improve performance.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123274774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment","authors":"K. Aisopos, A. DeOrio, L. Peh, V. Bertacco","doi":"10.1109/PACT.2011.61","DOIUrl":"https://doi.org/10.1109/PACT.2011.61","url":null,"abstract":"Extreme transistor technology scaling is causing increasing concerns in device reliability: the expected lifetime of individual transistors in complex chips is quickly decreasing, and the problem is expected to worsen at future technology nodes. With complex designs increasingly relying on Networks-on-Chip (NoCs) for on-chip data transfers, a NoC must continue to operate even in the face of many transistor failures. Specifically, it must be able to reconfigure and reroute packets around faults to enable continued operation, i.e., generate new routing paths to replace the old ones upon a failure. In addition to these reliability requirements, NoCs must maintain low latency and high throughput at very low area budget. In this work, we propose a distributed reconfiguration solution named Ariadne, targeting large, aggressively scaled, unreliable NoCs. Ariadne utilizes up*/down* for fast routing at high bandwidth, and upon any number of concurrent network failures in any location, it reconfigures to discover new resilient paths to connect the surviving nodes. Experimental results show that Ariadne provides a 40%-140% latency improvement (when subject to 50 faults in a 64-node NoC) over other on-chip state-of-the-art fault tolerant solutions, while meeting the low area budget of on-chip routers with an overhead of just 1.97%.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132048710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Awasthi, D. Nellans, R. Balasubramonian, A. Davis
{"title":"Prediction Based DRAM Row-Buffer Management in the Many-Core Era","authors":"M. Awasthi, D. Nellans, R. Balasubramonian, A. Davis","doi":"10.1109/PACT.2011.31","DOIUrl":"https://doi.org/10.1109/PACT.2011.31","url":null,"abstract":"Modern processors are experiencing interleaved memory access streams from different threads/cores, reducing the spatial locality that is seen at the memory controller, making the combined stream appear increasingly random. Traditional methods for exploiting locality at the DRAM level, such as open-page and timer-based policies, become less effective as the number of threads accessing memory increases. Employing closed-page policies in such systems can improve performance but it eliminates any possibility of exploiting locality. In this paper, we build upon the key insight that a history-based predictor that tracks the number of accesses to a given DRAM page is a much better indicator of DRAM locality than timer based policies. We extend prior work to propose a simple Access Based Predictor (ABP) that tracks limited access history at the page level to determine page closure decisions, and does so with much smaller storage overhead than previously proposed policies. We show that ABP, with additional optimizations, can improve system throughput by 12.3% and 21.6% over open and closed-page policies, respectively. The proposed ABP requires 20 KB of storage overhead and is outside the critical path of memory access.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132112109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ganesh S. Dasika, Ankit Sethia, T. Mudge, S. Mahlke
{"title":"PEPSC: A Power-Efficient Processor for Scientific Computing","authors":"Ganesh S. Dasika, Ankit Sethia, T. Mudge, S. Mahlke","doi":"10.1109/PACT.2011.16","DOIUrl":"https://doi.org/10.1109/PACT.2011.16","url":null,"abstract":"The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost of a high-end laptop computer. While these devices have clearly changed the landscape of computing, there are two central problems that arise. First, GPUs are designed and optimized for graphics applications resulting in delivered performance that is far below peak for more general scientific and mathematical applications. Second, GPUs are power hungry devices that often consume 100-300 watts, which restricts the scalability of the solution and requires expensive cooling. To combat these challenges, this paper presents the PEPSC architecture -- an architecture customized for the domain of data parallel scientific applications where power-efficiency is the central focus. PEPSC utilizes a combination of a two-dimensional single-instruction multiple-data (SIMD) data path, an intelligent dynamic prefetching mechanism, and a configurable SIMD control approach to increase execution efficiency over conventional GPUs. A single PEPSC core has a peak performance of 120 GFLOPs while consuming 2W of power when executing modern scientific applications, which represents an increase in computation efficiency of more than 10X over existing GPUs.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128783217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ping Zhou, Bo Zhao, Youtao Zhang, Jun Yang, Yiran Chen
{"title":"MRAC: A Memristor-based Reconfigurable Framework for Adaptive Cache Replacement","authors":"Ping Zhou, Bo Zhao, Youtao Zhang, Jun Yang, Yiran Chen","doi":"10.1109/PACT.2011.29","DOIUrl":"https://doi.org/10.1109/PACT.2011.29","url":null,"abstract":"Memristor, a long postulated yet missing circuit element, has recently emerged as a promising device in non-volatile memory technologies. However, beyond its use as memory cell, it is challenging to integrate memristor in modern architectures for general purpose computation. In this paper we propose a non-conventional use of memristor and demonstrate its applicability to enhancing cache replacement policy. We design a memristor-based saturation counter which can track cache access history at low cost. Based on our counter design, we develop a cache replacement framework that is both reconfigurable and adaptive (MRAC). Our evaluation demonstrates MRAC's reconfigurability and adaptivity, which result in better performance and more robust performance improvement.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129458033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}