2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)最新文献

筛选
英文 中文
Large-Scale Log-Determinant Computation via Weighted L_2 Polynomial Approximation with Prior Distribution of Eigenvalues 基于特征值先验分布的加权L_2多项式近似大规模对数行列式计算
2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-07-26 DOI: 10.1007/978-3-319-32557-6_12
Wei Peng, Hongxia Wang
{"title":"Large-Scale Log-Determinant Computation via Weighted L_2 Polynomial Approximation with Prior Distribution of Eigenvalues","authors":"Wei Peng, Hongxia Wang","doi":"10.1007/978-3-319-32557-6_12","DOIUrl":"https://doi.org/10.1007/978-3-319-32557-6_12","url":null,"abstract":"","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"156 1","pages":"120-125"},"PeriodicalIF":0.0,"publicationDate":"2015-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79889064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Platform for Routine Development of Ternary Optical Computers 三元光学计算机的常规开发平台
2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-07-26 DOI: 10.1007/978-3-319-32557-6_15
Xianshun Ping, Jun-jie Peng, Shan Ouyang, Yunfu Shen, Yi Jin
{"title":"A Platform for Routine Development of Ternary Optical Computers","authors":"Xianshun Ping, Jun-jie Peng, Shan Ouyang, Yunfu Shen, Yi Jin","doi":"10.1007/978-3-319-32557-6_15","DOIUrl":"https://doi.org/10.1007/978-3-319-32557-6_15","url":null,"abstract":"","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"15 1","pages":"143-149"},"PeriodicalIF":0.0,"publicationDate":"2015-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74976656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FTXen: Making hypervisor resilient to hardware faults on relaxed cores FTXen:使虚拟机管理程序对放松的核心上的硬件故障具有弹性
Xinxin Jin, Soyeon Park, Tianwei Sheng, Rishan Chen, Zhiyong Shan, Yuanyuan Zhou
{"title":"FTXen: Making hypervisor resilient to hardware faults on relaxed cores","authors":"Xinxin Jin, Soyeon Park, Tianwei Sheng, Rishan Chen, Zhiyong Shan, Yuanyuan Zhou","doi":"10.1109/HPCA.2015.7056054","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056054","url":null,"abstract":"As CMOS technology scales, the Increasingly smaller transistor components are susceptible to a variety of in-field hardware errors. Traditional redundancy techniques to deal with the increasing error rates are expensive and energy inefficient. To address this emerging challenge, many researchers have recently proposed the idea of relaxed hardware design and exposing errors to software. For such relaxed hardware to become a reality, it is crucially important for system software, such as the virtual machine hypervisor, to be resilient to hardware faults. To address the above fundamental software challenge in enabling relaxed hardware design, we are making a major effort in restructuring an important part of system software, namely the virtual machine hypervisor, to be resilient to faulty cores. A fault in a relaxed core can only affect those virtual machines (and applications) running on that core, but the hypervisor and other virtual machines remain intact and continue providing services. We have redesigned every component of Xen, a large, popular virtual machine hypervisor, to achieve such error resiliency. This paper presents our design and implementation of the restructured Xen (we refer to it as FTXen). Our experimental evaluation on real systems shows that FTXen adds minimum application overhead, and scales well to different ratios of reliable and relaxed cores. Our results with random fault injection show that FTXen can successfully survive all injected hardware faults.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"18 1","pages":"451-462"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82098505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Architecture exploration for ambient energy harvesting nonvolatile processors 环境能量收集非易失性处理器的架构探索
Kaisheng Ma, Yang Zheng, Shuangchen Li, Karthik Swaminathan, Xueqing Li, Yongpan Liu, J. Sampson, Yuan Xie, N. Vijaykrishnan
{"title":"Architecture exploration for ambient energy harvesting nonvolatile processors","authors":"Kaisheng Ma, Yang Zheng, Shuangchen Li, Karthik Swaminathan, Xueqing Li, Yongpan Liu, J. Sampson, Yuan Xie, N. Vijaykrishnan","doi":"10.1109/HPCA.2015.7056060","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056060","url":null,"abstract":"Energy harvesting has been widely investigated as a promising method of providing power for ultra-low-power applications. Such energy sources include solar energy, radio-frequency (RF) radiation, piezoelectricity, thermal gradients, etc. However, the power supplied by these sources is highly unreliable and dependent upon ambient environment factors. Hence, it is necessary to develop specialized systems that are tolerant to this power variation, and also capable of making forward progress on the computation tasks. The simulation platform in this paper is calibrated using measured results from a fabricated nonvolatile processor and used to explore the design space for a nonvolatile processor with different architectures, different input power sources, and policies for maximizing forward progress.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"36 1","pages":"526-537"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87165245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 225
Exploring architectural heterogeneity in intelligent vision systems 探索智能视觉系统的架构异质性
Nandhini Chandramoorthy, Giuseppe Tagliavini, K. Irick, A. Pullini, Siddharth Advani, Sulaiman Al Habsi, M. Cotter, J. Sampson, N. Vijaykrishnan, L. Benini
{"title":"Exploring architectural heterogeneity in intelligent vision systems","authors":"Nandhini Chandramoorthy, Giuseppe Tagliavini, K. Irick, A. Pullini, Siddharth Advani, Sulaiman Al Habsi, M. Cotter, J. Sampson, N. Vijaykrishnan, L. Benini","doi":"10.1109/HPCA.2015.7056017","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056017","url":null,"abstract":"Limited power budgets and the need for high performance computing have led to platform customization with a number of accelerators integrated with CMPs. In order to study customized architectures, we model four customization design points and compare their performance and energy across a number of computer vision workloads. We analyze the limitations of generic architectures and quantify the costs of increasing customization using these micro-architectural design points. This analysis leads us to develop a framework consisting of low-power multi-cores and an array of configurable micro-accelerator functional units. Using this platform, we illustrate dataflow and control processing optimizations that provide for performance gains similar to custom ASICs for a wide range of vision benchmarks.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"34 1","pages":"1-12"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77922504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting 肾上腺素:精确定位和控制尾部查询与快速电压提升
Chang-Hong Hsu, Yunqi Zhang, M. Laurenzano, David Meisner, T. Wenisch, Jason Mars, Lingjia Tang, R. Dreslinski
{"title":"Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting","authors":"Chang-Hong Hsu, Yunqi Zhang, M. Laurenzano, David Meisner, T. Wenisch, Jason Mars, Lingjia Tang, R. Dreslinski","doi":"10.1109/HPCA.2015.7056039","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056039","url":null,"abstract":"Reducing the long tail of the query latency distribution in modern warehouse scale computers is critical for improving performance and quality of service of workloads such as Web Search and Memcached. Traditional turbo boost increases a processor's voltage and frequency during a coarse-grain sliding window, boosting all queries that are processed during that window. However, the inability of such a technique to pinpoint tail queries for boosting limits its tail reduction benefit. In this work, we propose Adrenaline, an approach to leverage finer granularity, 10's of nanoseconds, voltage boosting to effectively rein in the tail latency with query-level precision. Two key insights underlie this work. First, emerging finer granularity voltage/frequency boosting is an enabling mechanism for intelligent allocation of the power budget to precisely boost only the queries that contribute to the tail latency; and second, per-query characteristics can be used to design indicators for proactively pinpointing these queries, triggering boosting accordingly. Based on these insights, Adrenaline effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost. By evaluating under various workload configurations, we demonstrate the effectiveness of our methodology. We achieve up to a 2.50x tail latency improvement for Memcached and up to a 3.03x for Web Search over coarse-grained DVFS given a fixed boosting power budget. When optimizing for energy reduction, Adrenaline achieves up to a 1.81x improvement for Memcached and up to a 1.99x for Web Search over coarse-grained DVFS.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"15 1","pages":"271-282"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74338189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing BRAINIAC:为神经实现的近似计算带来可靠的准确性
B. Grigorian, Nazanin Farahpour, Glenn D. Reinman
{"title":"BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing","authors":"B. Grigorian, Nazanin Farahpour, Glenn D. Reinman","doi":"10.1109/HPCA.2015.7056067","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056067","url":null,"abstract":"Applications with large amounts of data, real-time constraints, ultra-low power requirements, and heavy computational complexity present significant challenges for modern computing systems, and often fall within the category of high performance computing (HPC). As such, computer architects have looked to high performance single instruction multiple data (SIMD) architectures, such as accelerator-rich platforms, for handling these workloads. However, since the results of these applications do not always require exact precision, approximate computing may also be leveraged. In this work, we introduce BRAINIAC, a heterogeneous platform that combines precise accelerators with neural-network-based approximate accelerators. These reconfigurable accelerators are leveraged in a multi-stage flow that begins with simple approximations and resorts to more complex ones as needed. We employ high-level, application-specific light-weight checks (LWCs) to throttle this multi-stage acceleration flow and reliably ensure user-specified accuracy at runtime. Evaluation of the performance and energy of our heterogeneous platform for error tolerance thresholds of 5%-25% demonstrates an average of 3× gain over computation that only includes precise acceleration, and 15×-35× gain over software-based computation.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"63 1","pages":"615-626"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80762630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Flask coherence: A morphable hybrid coherence protocol to balance energy, performance and scalability Flask coherence:一种可变形的混合相干协议,用于平衡能量、性能和可扩展性
L. G. Menezo, Valentin Puente, J. Gregorio
{"title":"Flask coherence: A morphable hybrid coherence protocol to balance energy, performance and scalability","authors":"L. G. Menezo, Valentin Puente, J. Gregorio","doi":"10.1109/HPCA.2015.7056033","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056033","url":null,"abstract":"This work proposes a mechanism to hybridize the benefits of snoop-based and directory-based coherence protocols in a single construct. A non-inclusive sparse-directory is used to minimize energy requirements and guarantee scalability. Directory entries will be used only by the most actively shared blocks. To preserve system correctness token counting is used. Additionally, each directory entry is augmented with a counting bloom filter that suppresses most unnecessary on-chip and off-chip requests. Combining all these elements, the proposal, with a low storage overhead, is able to suppress most traffic inherent to snoop-based protocols. With a directory capable of tracking just 40% of the blocks kept in private caches, this coherence protocol is able to match the performance and energy of a sparse-directory capable of tracking 160% of the blocks. Using the same configuration, it can outperform the performance and on-chip memory hierarchy energy of a broadcast-based coherence protocol such as Token by 10% and 20% respectively. To achieve these results, the proposal uses an improved counting bloom filter, which provides twice the space efficiency of a conventional one with similar implementation cost. This filter also enables the coherence controller storage used to track shared blocks and filter private block misses to change dynamically according to the data-sharing properties of the application. With only 5 % of tracked private cache entries, the average performance degradation of this construct is less than 8% compared to a 160% over-provisioned sparse-directory.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"32 1","pages":"198-209"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82894942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Adaptive-latency DRAM: Optimizing DRAM timing for the common-case 自适应延迟DRAM:优化普通情况下的DRAM时序
Donghyuk Lee, Yoongu Kim, Gennady Pekhimenko, S. Khan, V. Seshadri, K. Chang, O. Mutlu
{"title":"Adaptive-latency DRAM: Optimizing DRAM timing for the common-case","authors":"Donghyuk Lee, Yoongu Kim, Gennady Pekhimenko, S. Khan, V. Seshadri, K. Chang, O. Mutlu","doi":"10.1109/HPCA.2015.7056057","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056057","url":null,"abstract":"In current systems, memory accesses to a DRAM chip must obey a set of minimum latency restrictions specified in the DRAM standard. Such timing parameters exist to guarantee reliable operation. When deciding the timing parameters, DRAM manufacturers incorporate a very large margin as a provision against two worst-case scenarios. First, due to process variation, some outlier chips are much slower than others and cannot be operated as fast. Second, chips become slower at higher temperatures, and all chips need to operate reliably at the highest supported (i.e., worst-case) DRAM temperature (85° C). In this paper, we show that typical DRAM chips operating at typical temperatures (e.g., 55° C) are capable of providing a much smaller access latency, but are nevertheless forced to operate at the largest latency of the worst-case. Our goal in this paper is to exploit the extra margin that is built into the DRAM timing parameters to improve performance. Using an FPGA-based testing platform, we first characterize the extra margin for 115 DRAM modules from three major manufacturers. Our results demonstrate that it is possible to reduce four of the most critical timing parameters by a minimum/maximum of 17.3%/54.8% at 55°C without sacrificing correctness. Based on this characterization, we propose Adaptive-Latency DRAM (AL-DRAM), a mechanism that adoptively reduces the timing parameters for DRAM modules based on the current operating condition. AL-DRAM does not require any changes to the DRAM chip or its interface. We evaluate AL-DRAM on a real system that allows us to reconfigure the timing parameters at runtime. We show that AL-DRAM improves the performance of memory-intensive workloads by an average of 14% without introducing any errors. We discuss and show why AL-DRAM does not compromise reliability. We conclude that dynamically optimizing the DRAM timing parameters can reliably improve system performance.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"26 1","pages":"489-501"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81238762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 206
Understanding contention-based channels and using them for defense 理解基于争用的渠道并利用它们进行防御
Casen Hunger, Mikhail Kazdagli, A. Rawat, A. Dimakis, S. Vishwanath, Mohit Tiwari
{"title":"Understanding contention-based channels and using them for defense","authors":"Casen Hunger, Mikhail Kazdagli, A. Rawat, A. Dimakis, S. Vishwanath, Mohit Tiwari","doi":"10.1109/HPCA.2015.7056069","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056069","url":null,"abstract":"Microarchitectural resources such as caches and predictors can be used to leak information across security domains. Significant prior work has demonstrated attacks and defenses for specific types of such microarchitectural side and covert channels. In this paper, we introduce a general mathematical study of microarchitectural channels using information theory. Our conceptual contribution is a simple mathematical abstraction that captures the common characteristics of all microarchitectural channels. We call this the Bucket model and it reveals that microarchitectural channels are fundamentally different from side and covert channels in networking. We then quantify the communication capacity of several microarchitectural covert channels (including channels that rely on performance counters, AES hardware and memory buses) and measure bandwidths across both KVM based heavy-weight virtualization and light-weight operating-system level isolation. We demonstrate channel capacities that are orders of magnitude higher compared to what was previously considered possible. Finally, we introduce a novel way of detecting intelligent adversaries that try to hide while running covert channel eavesdropping attacks. Our method generalizes a prior detection scheme (that modeled static adversaries) by introducing noise that hides the detection process from an intelligent eavesdropper.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"32 1","pages":"639-650"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86957962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信