2011 International Conference on Parallel Architectures and Compilation Techniques最新文献

筛选
英文 中文
Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour 利用MRU-Tour的概念提高最后一级缓存性能
A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato
{"title":"Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour","authors":"A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato","doi":"10.1109/PACT.2011.47","DOIUrl":"https://doi.org/10.1109/PACT.2011.47","url":null,"abstract":"Last-Level Caches (LLCs) implement the LRU algorithm to exploit temporal locality, but its performance is quite far of Belady's optimal algorithm as the number of ways increases. One of the main reasons because of LRU does not reach good performance in LLCs is that this policy forces a block to descend until the bottom of the stack before eviction. Nevertheless, most of the blocks that leave the MRU position are not referenced again before eviction. This work pursues to select candidate blocks to be victimized before reaching the bottom of the stack. To this end, this work defines the number of MRU-Tours (MRUTs) of a block as the number of times that a block enters in the MRU position during its live time. Based on the fact that most of the blocks exhibit a single MRUT, this work presents the family of MRUT-based algorithms aimed at exploiting this block behavior to improve performance.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123274774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment 阿里阿德涅:断开网络环境中的不可知重构
K. Aisopos, A. DeOrio, L. Peh, V. Bertacco
{"title":"ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment","authors":"K. Aisopos, A. DeOrio, L. Peh, V. Bertacco","doi":"10.1109/PACT.2011.61","DOIUrl":"https://doi.org/10.1109/PACT.2011.61","url":null,"abstract":"Extreme transistor technology scaling is causing increasing concerns in device reliability: the expected lifetime of individual transistors in complex chips is quickly decreasing, and the problem is expected to worsen at future technology nodes. With complex designs increasingly relying on Networks-on-Chip (NoCs) for on-chip data transfers, a NoC must continue to operate even in the face of many transistor failures. Specifically, it must be able to reconfigure and reroute packets around faults to enable continued operation, i.e., generate new routing paths to replace the old ones upon a failure. In addition to these reliability requirements, NoCs must maintain low latency and high throughput at very low area budget. In this work, we propose a distributed reconfiguration solution named Ariadne, targeting large, aggressively scaled, unreliable NoCs. Ariadne utilizes up*/down* for fast routing at high bandwidth, and upon any number of concurrent network failures in any location, it reconfigures to discover new resilient paths to connect the surviving nodes. Experimental results show that Ariadne provides a 40%-140% latency improvement (when subject to 50 faults in a 64-node NoC) over other on-chip state-of-the-art fault tolerant solutions, while meeting the low area budget of on-chip routers with an overhead of just 1.97%.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132048710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
Prediction Based DRAM Row-Buffer Management in the Many-Core Era 多核时代基于预测的DRAM行缓冲管理
M. Awasthi, D. Nellans, R. Balasubramonian, A. Davis
{"title":"Prediction Based DRAM Row-Buffer Management in the Many-Core Era","authors":"M. Awasthi, D. Nellans, R. Balasubramonian, A. Davis","doi":"10.1109/PACT.2011.31","DOIUrl":"https://doi.org/10.1109/PACT.2011.31","url":null,"abstract":"Modern processors are experiencing interleaved memory access streams from different threads/cores, reducing the spatial locality that is seen at the memory controller, making the combined stream appear increasingly random. Traditional methods for exploiting locality at the DRAM level, such as open-page and timer-based policies, become less effective as the number of threads accessing memory increases. Employing closed-page policies in such systems can improve performance but it eliminates any possibility of exploiting locality. In this paper, we build upon the key insight that a history-based predictor that tracks the number of accesses to a given DRAM page is a much better indicator of DRAM locality than timer based policies. We extend prior work to propose a simple Access Based Predictor (ABP) that tracks limited access history at the page level to determine page closure decisions, and does so with much smaller storage overhead than previously proposed policies. We show that ABP, with additional optimizations, can improve system throughput by 12.3% and 21.6% over open and closed-page policies, respectively. The proposed ABP requires 20 KB of storage overhead and is outside the critical path of memory access.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132112109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
PEPSC: A Power-Efficient Processor for Scientific Computing PEPSC:用于科学计算的高能效处理器
Ganesh S. Dasika, Ankit Sethia, T. Mudge, S. Mahlke
{"title":"PEPSC: A Power-Efficient Processor for Scientific Computing","authors":"Ganesh S. Dasika, Ankit Sethia, T. Mudge, S. Mahlke","doi":"10.1109/PACT.2011.16","DOIUrl":"https://doi.org/10.1109/PACT.2011.16","url":null,"abstract":"The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost of a high-end laptop computer. While these devices have clearly changed the landscape of computing, there are two central problems that arise. First, GPUs are designed and optimized for graphics applications resulting in delivered performance that is far below peak for more general scientific and mathematical applications. Second, GPUs are power hungry devices that often consume 100-300 watts, which restricts the scalability of the solution and requires expensive cooling. To combat these challenges, this paper presents the PEPSC architecture -- an architecture customized for the domain of data parallel scientific applications where power-efficiency is the central focus. PEPSC utilizes a combination of a two-dimensional single-instruction multiple-data (SIMD) data path, an intelligent dynamic prefetching mechanism, and a configurable SIMD control approach to increase execution efficiency over conventional GPUs. A single PEPSC core has a peak performance of 120 GFLOPs while consuming 2W of power when executing modern scientific applications, which represents an increase in computation efficiency of more than 10X over existing GPUs.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128783217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
MRAC: A Memristor-based Reconfigurable Framework for Adaptive Cache Replacement MRAC:一种基于忆阻器的自适应缓存替换可重构框架
Ping Zhou, Bo Zhao, Youtao Zhang, Jun Yang, Yiran Chen
{"title":"MRAC: A Memristor-based Reconfigurable Framework for Adaptive Cache Replacement","authors":"Ping Zhou, Bo Zhao, Youtao Zhang, Jun Yang, Yiran Chen","doi":"10.1109/PACT.2011.29","DOIUrl":"https://doi.org/10.1109/PACT.2011.29","url":null,"abstract":"Memristor, a long postulated yet missing circuit element, has recently emerged as a promising device in non-volatile memory technologies. However, beyond its use as memory cell, it is challenging to integrate memristor in modern architectures for general purpose computation. In this paper we propose a non-conventional use of memristor and demonstrate its applicability to enhancing cache replacement policy. We design a memristor-based saturation counter which can track cache access history at low cost. Based on our counter design, we develop a cache replacement framework that is both reconfigurable and adaptive (MRAC). Our evaluation demonstrates MRAC's reconfigurability and adaptivity, which result in better performance and more robust performance improvement.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129458033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信