2013 IEEE 31st International Conference on Computer Design (ICCD)最新文献_第6页

LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions LightTx:基于闪存的ssd中的轻量级事务设计，支持灵活的事务

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657033

Youyou Lu, J. Shu, Jiayang Guo, Shuai Li, O. Mutlu

{"title":"LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions","authors":"Youyou Lu, J. Shu, Jiayang Guo, Shuai Li, O. Mutlu","doi":"10.1109/ICCD.2013.6657033","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657033","url":null,"abstract":"Flash memory has accelerated the architectural evolution of storage systems with its unique characteristics compared to magnetic disks. The no-overwrite property of flash memory has been leveraged to efficiently support transactions, a commonly used mechanism in systems to provide consistency. However, existing transaction designs embedded in flash-based Solid State Drives (SSDs) have limited support for transaction flexibility, i.e., support for different isolation levels between transactions, which is essential to enable different systems to make tradeoffs between performance and consistency. Since they provide support for only strict isolation between transactions, existing designs lead to a reduced number of on-the-fly requests and therefore cannot exploit the abundant internal parallelism of an SSD. There are two design challenges that need to be overcome to support flexible transactions: (1) enabling a transaction commit protocol that supports parallel execution of transactions; and (2) efficiently tracking the state of transactions that have pages scattered over different locations due to parallel allocation of pages. In this paper, we propose LightTx to address these two challenges. LightTx supports transaction flexibility using a lightweight embedded transaction design. The design of LightTx is based on two key techniques. First, LightTx uses a commit protocol that determines the transaction state solely inside each transaction (as opposed to having dependencies between transactions that complicate state tracking) in order to support parallel transaction execution. Second, LightTx periodically retires the dead transactions to reduce transaction state tracking cost. Experiments show that LightTx provides up to 20.6% performance improvement due to transaction flexibility. LightTx also achieves nearly the lowest overhead in garbage collection and mapping persistence compared to existing embedded transaction designs.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129681623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Data compression for thermal mitigation in the Hybrid Memory Cube 在混合内存多维数据集中进行数据压缩以降低热影响

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657041

M. Khurshid, Mikko H. Lipasti

{"title":"Data compression for thermal mitigation in the Hybrid Memory Cube","authors":"M. Khurshid, Mikko H. Lipasti","doi":"10.1109/ICCD.2013.6657041","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657041","url":null,"abstract":"Main memory performance is becoming an increasingly important factor contributing to overall system performance, especially due to the so-called memory wall. The Hybrid Memory Cube (HMC) is an attempt to overcome this memory wall by stacking DRAM on top of a logic die and interconnecting them with dense and fast through silicon vias (TSVs). However, modeling the Hybrid Memory Cube in HotSpot has indicated that this cube has a natural temperature variation, with the hottest layers at the bottom and the cooler layers at the top. High temperatures and variations within a DRAM can result in reduced performance and efficiency, especially when dynamic thermal management (DTM) schemes are used to throttle DRAM bandwidth whenever temperature gets too high. Hence this paper attempts to reduce the maximum temperature and variation by using data compression, where the compression is performed in the on chip memory controller, and the compressed blocks are read/written using fewer bursts in the Hybrid Memory Cube, hence reducing power dissipation. The compressed blocks are stored only in the hotter banks of the cube to mitigate the thermal gradient in the cube. Maximum temperature was reduced by as much as 6°C, and since the HMC spent lesser time throttling when DTM schemes were used, a maximum of 14.2% speed up was observed, at an average of 2.8%.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125728440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

High accuracy approximate multiplier with error correction 高精度近似乘法器，带误差校正

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657022

Chia-Hao Lin, Ing-Chao Lin

引用次数: 175

Simulation and architecture improvements of atomic operations on GPU scratchpad memory GPU刮本存储器上原子运算的仿真与架构改进

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657065

Gert-Jan van den Braak, Juan Gómez-Luna, H. Corporaal, José María González-Linares, Nicolás Guil Mata

{"title":"Simulation and architecture improvements of atomic operations on GPU scratchpad memory","authors":"Gert-Jan van den Braak, Juan Gómez-Luna, H. Corporaal, José María González-Linares, Nicolás Guil Mata","doi":"10.1109/ICCD.2013.6657065","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657065","url":null,"abstract":"GPUs are increasingly used as compute accelerators. With a large number of cores executing an even larger number of threads, significant speed-ups can be attained for parallel workloads. Applications that rely on atomic operations, such as histogram and Hough transform, suffer from serialization of threads in case they update the same memory location. Previous work shows that reducing this serialization with software techniques can increase performance by an order of magnitude. We observe, however, that some serialization remains and still slows down these applications. Therefore, this paper proposes to use a hash function in both the addressing of the banks and the locks of the scratchpad memory. To measure the effects of these changes, we first implement a detailed model of atomic operations on scratchpad memory in GPGPU-Sim, and verify its correctness. Second, we test our proposed hardware changes. They result in a speed-up up to 4.9× and 1.8× on implementations utilizing the aforementioned software techniques for histogram and Hough transform applications respectively, with minimum hardware costs.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130665992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Scalable trace signal selection using machine learning 可扩展的跟踪信号选择使用机器学习

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657069

Kamran Rahmani, P. Mishra, S. Ray

引用次数: 21

FreshCache: Statically and dynamically exploiting dataless ways FreshCache:静态和动态地利用无数据的方式

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657055

Arkaprava Basu, Derek Hower, M. Hill, M. Swift

{"title":"FreshCache: Statically and dynamically exploiting dataless ways","authors":"Arkaprava Basu, Derek Hower, M. Hill, M. Swift","doi":"10.1109/ICCD.2013.6657055","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657055","url":null,"abstract":"Last level caches (LLCs) account for a substantial fraction of the area and power budget in many modern processors. Two recent trends - dwindling die yield that falls off sharply with larger chips and increasing static power - make a strong case for a fresh look at LLC design. Inclusive caches are particularly interesting because many commercially successful processors use inclusion to ease coherence at a cost of some data being stale or redundant. Prior works have demonstrated that LLC designs could be improved through static (at design time) or dynamic (at runtime) use of “dataless ways”. The static dataless ways removes the data-but not tags-from some cache ways to save energy and area without complicating inclusive-LLC coherence. A dynamic version (dynamic dataless ways) could dynamically turn off data, but not tags, effectively adapting the classic selective cache ways idea to save energy in LLC but not area. We find that (a) all our benchmarks benefit from dataless ways, but (b) the best number of dataless ways varies by workload. Thus, a pure static dataless design leaves energy-saving opportunity on the table, while a pure dynamic dataless design misses area-saving opportunity. To surpass both pure static and dynamic approaches, we develop the FreshCache LLC design that both statically and dynamically exploits dataless ways, including a predictor to adapt the number of dynamic dataless ways as well as detailed cache management policies. Results show that FreshCache saves more energy than static dataless ways alone (e.g., 72% vs. 9% of LLC) and more area by dynamic dataless ways only (e.g., 8% vs. 0% of LLC).","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130055063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Gate delay modeling for pre- and post-silicon timing related tasks for ultra-low power CMOS circuits 超低功耗CMOS电路前置和后硅定时相关任务的门延迟建模

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657047

P. Das, S. Gupta

{"title":"Gate delay modeling for pre- and post-silicon timing related tasks for ultra-low power CMOS circuits","authors":"P. Das, S. Gupta","doi":"10.1109/ICCD.2013.6657047","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657047","url":null,"abstract":"Power is increasingly the primary design constraint for chip designers and one of the main techniques for addressing this concern is aggressive voltage scaling. Device variability increases with voltage scaling and significantly affects gate delays at low voltages. Although existing delay models for near- and sub-threshold circuits capture the effects of variability on gate delays, they do not capture advanced delay phenomenon such as multiple input switching (MIS; also known as near-simultaneous transitions) at inputs of a gate. As a result, most of these gate delay models often grossly underestimate worst case delays, leading to selection of non-critical paths and generation of delay-inferior vectors for post-silicon timing related tasks. In this paper we present extensive experimental results to demonstrate that MIS has significant impact (around 30-40%) on delays of near-and sub-threshold nominal gates. We develop our model which guarantees that the minimum and maximum delay values it computes are guaranteed to bound the corresponding delay values in silicon. We show that our model has practical run-time complexity and works equally well for super-, near- and sub-threshold circuits. In particular, via extensive experimentations we show that our model never underestimates the delay and tightly bounds the actual delays. We also illustrate trade-offs between tightness of such bounds, their impact on validation cost, and runtime complexity.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130636051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Rationale for a 3D heterogeneous multi-core processor 3D异构多核处理器的基本原理

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657038

E. Rotenberg, Brandon H. Dwiel, E. Forbes, Zhenqian Zhang, Randy Widialaksono, Rangeen Basu Roy Chowdhury, Nyunyi M. Tshibangu, S. Lipa, W. R. Davis, P. Franzon

引用次数: 20

LPScan: An algorithm for supply scaling and switching activity minimization during test LPScan:一种在测试期间实现电源缩放和开关活动最小化的算法

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657083

S. Potluri, Satya Trinadh, Chidhambaranathan Rajamanikkam, S. Balachandran

引用次数: 2

Energy-efficient Runtime Adaptive Scrubbing in fault-tolerant Network-on-Chips (NoCs) architectures 容错片上网络(noc)架构中的高能效运行时自适应擦洗

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657052

T. Boraten, Avinash Karanth Kodi

{"title":"Energy-efficient Runtime Adaptive Scrubbing in fault-tolerant Network-on-Chips (NoCs) architectures","authors":"T. Boraten, Avinash Karanth Kodi","doi":"10.1109/ICCD.2013.6657052","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657052","url":null,"abstract":"As Networks-on-Chips (NoCs) continue to become more susceptible to process variation, cross-talk, hard and soft errors with technology scaling to sub-nanometer, there is an urgent need for adaptive Error Correction Coding (ECC) schemes for improving the resiliency of the system. The goal of adaptive ECC schemes should be two fold; decrease power consumption when errors are infrequent, thereby maximizing power savings and increase the fault coverage when errors are frequent, thereby improving application speedup while consuming more power. In this paper, we propose Runtime Adaptive Scrubbing (RAS), a novel multi-layered error correction and detection scheme for Networks-on-Chips (NoCs) architectures that intelligently adjusts fault coverage at the physical layer using variable strength encoders to scrub (protect) flits, thereby preventing faults from accumulating and propagating up to the logical layer. RAS successfully permits graceful network degradation while improving the overall network speedup, fault granularity, and wider fault coverage than traditional static schemes. Simulation results indicate that RAS improves network latency by an average of 10% for Splash-2/PARSEC benchmarks on a 8 × 8 mesh network while incurring 6.6% power penalty per flit and saving 15% in area overhead.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"61 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120923251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9