2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)最新文献_第8页

Paying to save: Reducing cost of colocation data center via rewards 为节省而付费:通过奖励降低托管数据中心的成本

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-02-01 DOI: 10.1109/HPCA.2015.7056036

M. A. Islam, Hasan Mahmud, Shaolei Ren, Xiaorui Wang

{"title":"Paying to save: Reducing cost of colocation data center via rewards","authors":"M. A. Islam, Hasan Mahmud, Shaolei Ren, Xiaorui Wang","doi":"10.1109/HPCA.2015.7056036","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056036","url":null,"abstract":"Power-hungry data centers face an urgent pressure on reducing the energy cost. The existing efforts, despite being numerous, have primarily centered around owner-operated data centers (e.g., Google), leaving another critical data center segment - colocation data center (e.g., Equinix) which rents out physical space to multiple tenants for housing their own servers - much less explored. Colocations have a major barrier to achieve cost efficiency: server power management by individual tenants is uncoordinated. This paper proposes RECO (REward for COst reduction), which shifts tenants' power management from uncoordinated to coordinated, using financial reward as a lever. RECO pays (voluntarily participating) tenants for energy reduction such that the colocation operator's overall cost is minimized. RECO incorporates the time-varying operation environment (e.g., cooling efficiency, intermittent renewables), addresses the peak power demand charge, and also proactively learns tenants' unknown responses to the offered reward. RECO includes a new feedback-based online algorithm to optimize the reward without far future offline information. We evaluate RECO using both scaled-down prototype experiments and simulations. Our results show that RECO is \"win-win\" and can successfully reduce the colocation operator's overall cost, by up to 27% compared to the no-incentive baseline case. Further, tenants receive financial rewards (up to 15% of their colocation costs) for \"free\" without violating Service Level Agreements.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"38 1","pages":"235-245"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90132943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

Talus: A simple way to remove cliffs in cache performance Talus:一个简单的方法来消除悬崖缓存性能

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-02-01 DOI: 10.1109/HPCA.2015.7056022

Nathan Beckmann, Daniel Sánchez

引用次数: 85

Overcoming far-end congestion in large-scale networks 克服大规模网络中的远端拥塞

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-02-01 DOI: 10.1109/HPCA.2015.7056051

Jongmin Won, Gwangsun Kim, John Kim, Ted Jiang, Mike Parker, Steve Scott

{"title":"Overcoming far-end congestion in large-scale networks","authors":"Jongmin Won, Gwangsun Kim, John Kim, Ted Jiang, Mike Parker, Steve Scott","doi":"10.1109/HPCA.2015.7056051","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056051","url":null,"abstract":"Accurately estimating congestion for proper global adaptive routing decisions (i.e., determine whether a packet should be routed minimally or non-minimally) has a significant impact on overall performance for high-radix topologies, such as the Dragonfly topology. Prior work have focused on understanding near-end congestion - i.e., congestion that occurs at the current router - or downstream congestion - i.e., congestion that occurs in downstream routers. However, most prior work do not evaluate the impact of far-end congestion or the congestion from the high channel latency between the routers. In this work, we refer to far-end congestion as phantom congestion as the congestion is not \"real\" congestion. Because of the long inter-router latency, the in-flight packets (and credits) result in inaccurate congestion information and can lead to inaccurate adaptive routing decisions. In addition, we show how transient congestion occurs as the occupancy of network queues fluctuate due to random traffic variation, even in steady-state conditions. This also results in inaccurate adaptive routing decisions that degrade network performance with lower throughput and higher latency. To overcome these limitations, we propose a history-window based approach to remove the impact of phantom congestion. We also show how using the average of local queue occupancies and adding an offset significantly remove the impact of transient congestion. Our evaluations of the adaptive routing in a large-scale Dragonfly network show that the combination of these techniques results in an adaptive routing that nearly matches the performance of an ideal adaptive routing algorithm.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"2 1","pages":"415-427"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72738941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules NDA:接近DRAM的加速架构，利用商品DRAM设备和标准内存模块

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-02-01 DOI: 10.1109/HPCA.2015.7056040

Amin Farmahini Farahani, Jung Ho Ahn, Katherine Morrow, N. Kim

{"title":"NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules","authors":"Amin Farmahini Farahani, Jung Ho Ahn, Katherine Morrow, N. Kim","doi":"10.1109/HPCA.2015.7056040","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056040","url":null,"abstract":"Energy consumed for transferring data across the processor memory hierarchy constitutes a large fraction of total system energy consumption, and this fraction has steadily increased with technology scaling. In this paper, we propose near-DRAM acceleration (NDA) architectures, which process data using accelerators 3D-stacked on DRAM devices comprising off-chip main memory modules. NDA transfers most data through high-bandwidth and low-energy 3D interconnects between accelerators and DRAM devices instead of low-bandwidth and high-energy off-chip interconnects between a processor and DRAM devices, substantially reducing energy consumption and improving performance. Unlike previous near-memory processing architectures, NDA is built upon commodity DRAM devices; apart from inserting through-silicon vias (TSVs) to 3D-interconnect DRAM devices and accelerators, NDA requires minimal changes to the commodity DRAM device and standard memory module architectures. This allows NDA to be more easily adopted in both existing and emerging systems. Our experiments demonstrate that, on average, our NDA-based system consumes 46% (68%) lower (data transfer) energy at 1.67× higher performance than a system that integrates the same accelerator logic within the processor itself.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"23 1","pages":"283-295"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82469792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 241

BeBoP: A cost effective predictor infrastructure for superscalar value prediction BeBoP:用于超标量值预测的经济有效的预测器基础结构

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-02-01 DOI: 10.1109/HPCA.2015.7056018

Arthur Perais, André Seznec

{"title":"BeBoP: A cost effective predictor infrastructure for superscalar value prediction","authors":"Arthur Perais, André Seznec","doi":"10.1109/HPCA.2015.7056018","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056018","url":null,"abstract":"Up to recently, it was considered that a performance-effective implementation of Value Prediction (VP) would add tremendous complexity and power consumption in the pipeline, especially in the Out-of-Order engine and the predictor infrastructure. Despite recent progress in the field of Value Prediction, this remains partially true. Indeed, if the recent EOLE architecture proposition suggests that the OoO engine need not be altered to accommodate VP, complexity in the predictor infrastructure itself is still problematic. First, multiple predictions must be generated each cycle, but multi-ported structures should be avoided. Second, the predictor should be small enough to be considered for implementation, yet coverage must remain high enough to increase performance. To address these remaining concerns, we first propose a block-based value prediction scheme mimicking current instruction fetch mechanisms, BeBoP. It associates the predicted values with a fetch block rather than distinct instructions. Second, to remedy the storage issue, we present the Differential VTAGE predictor. This new tightly coupled hybrid predictor covers instructions predictable by both VTAGE and Stride-based value predictors, and its hardware cost and complexity can be made similar to those of a modern branch predictor. Third, we show that block-based value prediction allows to implement the checkpointing mechanism needed to provide D-VTAGE with last computed/predicted values at moderate cost. Overall, we establish that EOLE with a 32.8KB block-based D-VTAGE predictor and a 4-issue OoO engine can significantly outperform a baseline 6-issue superscalar processor, by up to 62.2% and 11.2% on average (gmean), on our benchmark set.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"23 1","pages":"13-25"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73968415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

Scalable communication architecture for network-attached accelerators 用于网络附加加速器的可扩展通信体系结构

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-02-01 DOI: 10.1109/HPCA.2015.7056068

Sarah Neuwirth, Dirk Frey, M. Nüssle, U. Brüning

{"title":"Scalable communication architecture for network-attached accelerators","authors":"Sarah Neuwirth, Dirk Frey, M. Nüssle, U. Brüning","doi":"10.1109/HPCA.2015.7056068","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056068","url":null,"abstract":"On the road to Exascale computing, novel communication architectures are required to overcome the limitations of host-centric accelerators. Typically, accelerator devices require a local host CPU to configure and operate them. This limits the number of accelerators per host system. Network-attached accelerators are a new architectural approach for scaling the number of accelerators and host CPUs independently. In this paper, the communication architecture for network-attached accelerators is described which enables remote initialization and control of the accelerator devices. Furthermore, an operative prototype implementation is presented. The prototype accelerator node consists of an Intel Xeon Phi coprocessor and an EXTOLL NIC. The EXTOLL interconnect provides new features to enable accelerator-to-accelerator direct communication without a local host. Workloads can be dynamically assigned to CPUs and accelerators at run-time in an N to M ratio. The latency, bandwidth, and performance of the low-level implementation and MPI communication layer are presented. The LAMMPS molecular dynamics simulator is used to evaluate the communication architecture. The internode communication time is improved by up to 47%.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"26 1","pages":"627-638"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85721186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

CAFO: Cost aware flip optimization for asymmetric memories 非对称存储器的成本感知翻转优化

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-02-01 DOI: 10.1109/HPCA.2015.7056043

R. Maddah, Seyed Mohammad Seyedzadeh, R. Melhem

{"title":"CAFO: Cost aware flip optimization for asymmetric memories","authors":"R. Maddah, Seyed Mohammad Seyedzadeh, R. Melhem","doi":"10.1109/HPCA.2015.7056043","DOIUrl":"https://doi.org/10.1109/HPCA.2015.7056043","url":null,"abstract":"Phase Change Memory (PCM) and spin-transfer torque random access memory (STT-RAM) are emerging as new memory technologies to replace DRAM and NAND flash that are impeded by physical limitations. Programming PCM cells degrades their endurance while programming STT-RAM cells incurs a high bit error rate. Accordingly, several schemes have been proposed to service write requests while programing as few memory cells as possible. Nevertheless, those schemes did not address the asymmetry in programming memory cells that characterizes both PCM and STT-RAM. For instance, writing a bit value of 0 on PCM cells is more detrimental to endurance than 1 while writing a bit value of 1 on STT-RAM cells is more prone to error than 0. In this paper, we propose CAFO as a new cost aware flip reduction scheme. Essentially, CAFO encompasses a cost model that computes the cost of servicing write requests through assigning different costs to each cell that requires programming. Subsequently, CAFO encodes the data to be written into a form that incurs less cost through its cost aware encoding module. Overall, CAFO is capable of cutting down the write cost by up to 65% more than existing schemes.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"43 4 1","pages":"320-330"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90020227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 52

High Performance Computing and Applications, Second International Conference, HPCA 2009, Shanghai, China, August 10-12, 2009, Revised Selected Papers 高性能计算与应用，第二届国际学术会议，中国，上海，2009年8月10-12日，论文选集

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2010-01-01 DOI: 10.1007/978-3-642-11842-5

Wu Zhang, Zhangxin Chen, C. Douglas, W. Tong

引用次数: 0

A Modification of Regularized Newton-Type Method for Nonlinear Ill-Posed Problems

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2009-08-10 DOI: 10.1007/978-3-642-11842-5_40

Zehong Meng, Zhen-yu Zhao, Guo-qiang He

引用次数: 0

An Efficient Splitting Domain Decomposition Approach for Parabolic-Type Time-Dependent Problems in Porous Media 多孔介质中抛物型时相关问题的高效分裂域分解方法

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2009-08-10 DOI: 10.1007/978-3-642-11842-5_8

D. Liang, Chuanbin Du

引用次数: 2