2011 IEEE 17th International Symposium on High Performance Computer Architecture最新文献

筛选
英文 中文
Fg-STP: Fine-Grain Single Thread Partitioning on Multicores Fg-STP:多核上的细粒度单线程分区
2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI: 10.1109/HPCA.2011.5749713
Rakesh Ranjan, Fernando Latorre, P. Marcuello, Antonio González
{"title":"Fg-STP: Fine-Grain Single Thread Partitioning on Multicores","authors":"Rakesh Ranjan, Fernando Latorre, P. Marcuello, Antonio González","doi":"10.1109/HPCA.2011.5749713","DOIUrl":"https://doi.org/10.1109/HPCA.2011.5749713","url":null,"abstract":"Power and complexity issues have led the microprocessor industry to shift to Chip Multiprocessors in order to be able to better utilize the additional transistors ensured by Moore's law. While parallel programs are going to be able to take most of the advantage of these CMPs, single thread applications are not equipped to benefit from them. In this paper we propose Fine-Grain Single-Thread Partitioning (Fg-STP), a hardware-only scheme that takes advantage of CMP designs to speedup single-threaded applications. Our proposal improves single thread performance by reconfiguring two cores with the aim of collaborating on the fetching and execution of the instructions. These cores are basically conventional out-of-order cores in which execution is orchestrated using a dedicated hardware that has minimum and localized impact on the original design of the cores. This approach partitions the code at instruction granularity and differs from previous proposals on the extensive use of dependence speculation, replication and communication. These features are combined with the ability to look for parallelism on large instruction windows without any software intervention (no re-compilation or profiling hints are needed). These characteristics allow Fg-STP to speedup single thread by 18% and 7% on average over similar hardware-only approaches like Core Fusion, on medium sized and small sized 2-core CMP respectively for Spec 2006 benchmarks.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115135427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Offline symbolic analysis to infer Total Store Order 离线符号分析,以推断总存储订单
2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI: 10.1109/HPCA.2011.5749743
Dongyoon Lee, Mahmoud H. Said, S. Narayanasamy, Z. Yang
{"title":"Offline symbolic analysis to infer Total Store Order","authors":"Dongyoon Lee, Mahmoud H. Said, S. Narayanasamy, Z. Yang","doi":"10.1109/HPCA.2011.5749743","DOIUrl":"https://doi.org/10.1109/HPCA.2011.5749743","url":null,"abstract":"Ability to record and replay an execution can significantly help programmers debug their programs, especially parallel programs. De-terministically replaying a multiprocessor's execution under a relaxed memory model has remained a challenging problem. This is an important problem as most modern processors only support a relaxed memory model to enable many performance critical optimizations. The most common consistency model implemented in processors is the Total Store Order (TSO). We present an efficient and low-complexity processor based solution for recording and replaying under the Total Store Order (TSO) memory model. Processor provides support for logging data fetched on cache misses. Using this information each thread can be de-terministically replayed. A TSO-compliant casual order between the shared-memory accesses executed in different threads is then inferred using an offline algorithm based on Satisfiability Modulo Theory (SMT) solver. We also discuss methods to bound the search space during offline analysis and several optimizations to reduce the offline analysis time.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134352755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Atomic Coherence: Leveraging nanophotonics to build race-free cache coherence protocols 原子相干性:利用纳米光子学构建无竞争缓存相干性协议
2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI: 10.1109/HPCA.2011.5749723
D. Vantrease, Mikko H. Lipasti, N. Binkert
{"title":"Atomic Coherence: Leveraging nanophotonics to build race-free cache coherence protocols","authors":"D. Vantrease, Mikko H. Lipasti, N. Binkert","doi":"10.1109/HPCA.2011.5749723","DOIUrl":"https://doi.org/10.1109/HPCA.2011.5749723","url":null,"abstract":"This paper advocates Atomic Coherence, a framework that simplifies cache coherence protocol specification, design, and verification by decoupling races from the protocol's operation. Atomic Coherence requires conflicting coherence requests to the same addresses be serialized with a mutex before they are issued. Once issued, requests follow a predictable race-free path. Because requests are guaranteed not to race, coherence protocols are simpler and protocol extensions are straightforward. Our implementation of Atomic Coherence uses optical mutexes because optics provides very low latency. We begin with a state-of-the-art non-atomic MOEFSI protocol and demonstrate that an atomic implementation is much simpler while imposing less than a 2% performance penalty. We then show how, in the absence of races, it is easy to add support for speculative coherence and improve performance by up to 70%. Similar performance gains may be possible in a non-atomic protocol, but not without considerable effort in race management.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124521531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Bloom Filter Guided Transaction Scheduling 布隆过滤器引导的事务调度
2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI: 10.1109/HPCA.2011.5749718
G. Blake, R. Dreslinski, T. Mudge
{"title":"Bloom Filter Guided Transaction Scheduling","authors":"G. Blake, R. Dreslinski, T. Mudge","doi":"10.1109/HPCA.2011.5749718","DOIUrl":"https://doi.org/10.1109/HPCA.2011.5749718","url":null,"abstract":"Contention management is an important design component to a transactional memory system. Without effective contention management to ensure forward progress, a transactional memory system can experience live-lock, which is difficult to debug in parallel programs. Early work in contention management focused on heuristic managers that reacted to conflicts between transactions by picking the most appropriate transaction to abort. Reactive methods allow conflicts to happen repeatedly as they do not try to prevent future conflicts from happening. These shortcomings of reactive contention managers have led to proposals that approach contention management as a scheduling problem — proactive managers. Proactive techniques range from throttling execution in predicted periods of high contention to preventing groups of transactions running concurrently that are predicted likely to conflict. We propose a novel transaction scheduling scheme called “Bloom Filter Guided Transaction Scheduling” (BFGTS), that uses a combination of simple hardware and Bloom filter heuristics to guide scheduling decisions and provide enhanced performance in high contention situations. We compare to two state-of-the-art transaction schedulers, “Adaptive Transaction Scheduling” and “Proactive Transaction Scheduling” and show that BFGTS attains up to a 4.6× and 1.7× improvement on high contention benchmarks respectively. Across all benchmarks it shows a 35% and 25% average performance improvement respectively.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128975828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Low-voltage on-chip cache architecture using heterogeneous cell sizes for high-performance processors 使用异构单元尺寸的高性能处理器的低压片上缓存架构
2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI: 10.1109/HPCA.2011.5749715
H. Ghasemi, S. Draper, N. Kim
{"title":"Low-voltage on-chip cache architecture using heterogeneous cell sizes for high-performance processors","authors":"H. Ghasemi, S. Draper, N. Kim","doi":"10.1109/HPCA.2011.5749715","DOIUrl":"https://doi.org/10.1109/HPCA.2011.5749715","url":null,"abstract":"To date dynamic voltage/frequency scaling (DVFS) has been one of the most successful power-reduction techniques. However, ever-increasing process variability reduces the reliability of static random access memory (SRAM) at low voltages. This limits voltage scaling to a minimum operating voltage (VDDMIN). Larger SRAM cells, that are less sensitive to process variability, allow the use of lower VDDMIN. However, large-scale memory structures, e.g., the last-level cache (LLC) (that often determines the VDDMIN of the processor), cannot afford to use such large SRAM cells due to the die area constraint. In this paper we propose low-voltage LLC architectures that exploit 1) the DVFS characteristics of workloads running on high-performance processors, 2) the trade-off between SRAM cell size and VDDMIN, and 3) the fact that at lower voltage/frequency operating states the negative performance impact of having a smaller LLC capacity is reduced. Our proposed LLC architectures provide the same maximum performance and VDDMIN as the conventional architecture, while reducing the total LLC cell area by 15%–19% with negligible average runtime increase.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129182792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
SolarCore: Solar energy driven multi-core architecture power management SolarCore:太阳能驱动的多核架构电源管理
2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI: 10.1109/HPCA.2011.5749729
Chao Li, Wangyuan Zhang, Chang-Burm Cho, Tao Li
{"title":"SolarCore: Solar energy driven multi-core architecture power management","authors":"Chao Li, Wangyuan Zhang, Chang-Burm Cho, Tao Li","doi":"10.1109/HPCA.2011.5749729","DOIUrl":"https://doi.org/10.1109/HPCA.2011.5749729","url":null,"abstract":"The global energy crisis and environmental concerns (e.g. global warming) have driven the IT community into the green computing era. Of clean, renewable energy sources, solar power is the most promising. While efforts have been made to improve the performance-per-watt, conventional architecture power management schemes incur significant solar energy loss since they are largely workload-driven and unaware of the supply-side attributes. Existing solar power harvesting techniques improve the energy utilization but increase the environmental burden and capital investment due to the inclusion of large-scale batteries. Moreover, solar power harvesting itself cannot guarantee high performance without appropriate load adaptation. To this end, we propose SolarCore, a solar energy driven, multi-core architecture power management scheme that combines maximal power provisioning control and workload run-time optimization. Using real-world meteorological data across different geographic sites and seasons, we show that SolarCore is capable of achieving the optimal operation condition (e.g. maximal power point) of solar panels autonomously under various environmental conditions with a high green energy utilization of 82% on average. We propose efficient heuristics for allocating the time varying solar power across multiple cores and our algorithm can further improve the workload performance by 10.8% compared with that of round-robin adaptation, and at least 43% compared with that of conventional fixed-power budget control. This paper makes the first step on maximally reducing the carbon footprint of computing systems through the usage of renewable energy sources. We expect that the novel joint optimization techniques proposed in this paper will contribute to building a truly sustainable, high-performance computing environment.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128872057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
Fast thread migration via cache working set prediction 通过缓存工作集预测快速线程迁移
2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI: 10.1109/HPCA.2011.5749728
Jeffery A. Brown, Leo Porter, D. Tullsen
{"title":"Fast thread migration via cache working set prediction","authors":"Jeffery A. Brown, Leo Porter, D. Tullsen","doi":"10.1109/HPCA.2011.5749728","DOIUrl":"https://doi.org/10.1109/HPCA.2011.5749728","url":null,"abstract":"The most significant source of lost performance when a thread migrates between cores is the loss of cache state. A significant boost in post-migration performance is possible if the cache working set can be moved, proactively, with the thread. This work accelerates thread startup performance after migration by predicting and prefetching the working set of the application into the new cache. It shows that simply moving cache state performs poorly, and that moving the instruction working set can be even more critical than data. This paper demonstrates a technique that captures the access behavior of a thread, summarizes that behavior into a compact form for transfer between cores, and then prefetches appropriate data into the new caches based on the summary. It presents a detailed study of single-thread migration effects, and then demonstrates its utility on a speculative multithreading architecture. Working set prediction as much as doubles the performance of short-lived threads, and in a full speculative multithreading implementation, the technique is also shown to nearly double the effectiveness of the spawned threads.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131537951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Calvin: Deterministic or not? Free will to choose Calvin:决定论还是不决定论?选择的自由意志
2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI: 10.1109/HPCA.2011.5749741
Derek Hower, P. Dudnik, M. Hill, D. Wood
{"title":"Calvin: Deterministic or not? Free will to choose","authors":"Derek Hower, P. Dudnik, M. Hill, D. Wood","doi":"10.1109/HPCA.2011.5749741","DOIUrl":"https://doi.org/10.1109/HPCA.2011.5749741","url":null,"abstract":"Most shared memory systems maximize performance by unpredictably resolving memory races. Unpredictable memory races can lead to nondeterminism in parallel programs, which can suffer from hard-to-reproduce hiesenbugs. We introduce Calvin, a shared memory model capable of executing in a conventional nondeterministic mode when performance is paramount and a deterministic mode when execution repeatability is important. Unlike prior hardware proposals for deterministic execution, Calvin exploits the flexibility of a memory consistency model weaker than sequential consistency. Specifically, Calvin logically orders memory operations into strata that are compatible with the Total Store Order (TSO). Calvin is also designed with the needs of future power-aware processors in mind, and does not require any speculation support. We develop a Calvin-MIST implementation that uses an unordered coalescing write cache, multiple-write coherence protocol, and delayed (timebomb) invalidations while maintaining TSO compatibility. Results show that Calvin-MIST can execute workloads in conventional mode at speeds comparable to a conventional system (providing compatibility) or execute deterministically for a modest average slowdown of less than 20% (when determinism is valued).","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129133333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
FREE-p: Protecting non-volatile memory against both hard and soft errors FREE-p:保护非易失性内存免受硬错误和软错误的侵害
2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI: 10.1109/HPCA.2011.5749752
D. Yoon, Naveen Muralimanohar, Jichuan Chang, Parthasarathy Ranganathan, N. Jouppi, M. Erez
{"title":"FREE-p: Protecting non-volatile memory against both hard and soft errors","authors":"D. Yoon, Naveen Muralimanohar, Jichuan Chang, Parthasarathy Ranganathan, N. Jouppi, M. Erez","doi":"10.1109/HPCA.2011.5749752","DOIUrl":"https://doi.org/10.1109/HPCA.2011.5749752","url":null,"abstract":"Emerging non-volatile memories such as phase-change RAM (PCRAM) offer significant advantages but suffer from write endurance problems. However, prior solutions are oblivious to soft errors (recently raised as a potential issue even for PCRAM) and are incompatible with high-level fault tolerance techniques such as chipkill. To additionally address such failures requires unnecessarily high costs for techniques that focus singularly on wear-out tolerance. In this paper, we propose fine-grained remapping with ECC and embedded pointers (FREE-p). FREE-p remaps fine-grained worn-out NVRAM blocks without requiring large dedicated storage. We discuss how FREE-p protects against both hard and soft errors and can be extended to chipkill. Further, FREE-p can be implemented purely in the memory controller, avoiding custom NVRAM devices. In addition to these benefits, FREE-p increases NVRAM lifetime by up to 26% over the state-of-the-art even with severe process variation while performance degradation is less than 2% for the initial 7 years.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134426930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 182
A case for guarded power gating for multi-core processors 一种用于多核处理器的保护电源门控的案例
2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI: 10.1109/HPCA.2011.5749737
Niti Madan, A. Buyuktosunoglu, P. Bose, M. Annavaram
{"title":"A case for guarded power gating for multi-core processors","authors":"Niti Madan, A. Buyuktosunoglu, P. Bose, M. Annavaram","doi":"10.1109/HPCA.2011.5749737","DOIUrl":"https://doi.org/10.1109/HPCA.2011.5749737","url":null,"abstract":"Dynamic power management has become an essential part of multi-core processors and associated systems. Dedicated controllers with embedded power management firmware are now an integral part of design in such multi-core server systems. Devising a robust power management policy that meets system-intended functionality across a diverse range of workloads remains a key challenge. One of the primary issues of concern in architecting a power management policy is that of performance degradation beyond a specified limit. A secondary issue is that of negative power savings. Guarding against such “holes” in the management policy is crucial in order to ensure successful deployment and use in real customer environments. It is also important to focus on developing new models and addressing the limitations of current modeling infrastructure, in analyzing alternate management policies during the design of modern multi-core systems. In this concept paper, we highlight the above specific challenges that are faced today by the server chip and system design industry in the area of power management.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114926747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信