2013 IEEE 31st International Conference on Computer Design (ICCD)最新文献

筛选
英文 中文
LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions LightTx:基于闪存的ssd中的轻量级事务设计,支持灵活的事务
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657033
Youyou Lu, J. Shu, Jiayang Guo, Shuai Li, O. Mutlu
{"title":"LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions","authors":"Youyou Lu, J. Shu, Jiayang Guo, Shuai Li, O. Mutlu","doi":"10.1109/ICCD.2013.6657033","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657033","url":null,"abstract":"Flash memory has accelerated the architectural evolution of storage systems with its unique characteristics compared to magnetic disks. The no-overwrite property of flash memory has been leveraged to efficiently support transactions, a commonly used mechanism in systems to provide consistency. However, existing transaction designs embedded in flash-based Solid State Drives (SSDs) have limited support for transaction flexibility, i.e., support for different isolation levels between transactions, which is essential to enable different systems to make tradeoffs between performance and consistency. Since they provide support for only strict isolation between transactions, existing designs lead to a reduced number of on-the-fly requests and therefore cannot exploit the abundant internal parallelism of an SSD. There are two design challenges that need to be overcome to support flexible transactions: (1) enabling a transaction commit protocol that supports parallel execution of transactions; and (2) efficiently tracking the state of transactions that have pages scattered over different locations due to parallel allocation of pages. In this paper, we propose LightTx to address these two challenges. LightTx supports transaction flexibility using a lightweight embedded transaction design. The design of LightTx is based on two key techniques. First, LightTx uses a commit protocol that determines the transaction state solely inside each transaction (as opposed to having dependencies between transactions that complicate state tracking) in order to support parallel transaction execution. Second, LightTx periodically retires the dead transactions to reduce transaction state tracking cost. Experiments show that LightTx provides up to 20.6% performance improvement due to transaction flexibility. LightTx also achieves nearly the lowest overhead in garbage collection and mapping persistence compared to existing embedded transaction designs.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129681623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Data compression for thermal mitigation in the Hybrid Memory Cube 在混合内存多维数据集中进行数据压缩以降低热影响
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657041
M. Khurshid, Mikko H. Lipasti
{"title":"Data compression for thermal mitigation in the Hybrid Memory Cube","authors":"M. Khurshid, Mikko H. Lipasti","doi":"10.1109/ICCD.2013.6657041","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657041","url":null,"abstract":"Main memory performance is becoming an increasingly important factor contributing to overall system performance, especially due to the so-called memory wall. The Hybrid Memory Cube (HMC) is an attempt to overcome this memory wall by stacking DRAM on top of a logic die and interconnecting them with dense and fast through silicon vias (TSVs). However, modeling the Hybrid Memory Cube in HotSpot has indicated that this cube has a natural temperature variation, with the hottest layers at the bottom and the cooler layers at the top. High temperatures and variations within a DRAM can result in reduced performance and efficiency, especially when dynamic thermal management (DTM) schemes are used to throttle DRAM bandwidth whenever temperature gets too high. Hence this paper attempts to reduce the maximum temperature and variation by using data compression, where the compression is performed in the on chip memory controller, and the compressed blocks are read/written using fewer bursts in the Hybrid Memory Cube, hence reducing power dissipation. The compressed blocks are stored only in the hotter banks of the cube to mitigate the thermal gradient in the cube. Maximum temperature was reduced by as much as 6°C, and since the HMC spent lesser time throttling when DTM schemes were used, a maximum of 14.2% speed up was observed, at an average of 2.8%.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125728440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
High accuracy approximate multiplier with error correction 高精度近似乘法器,带误差校正
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657022
Chia-Hao Lin, Ing-Chao Lin
{"title":"High accuracy approximate multiplier with error correction","authors":"Chia-Hao Lin, Ing-Chao Lin","doi":"10.1109/ICCD.2013.6657022","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657022","url":null,"abstract":"Approximate computing has gained significant attention due to the popularity of multimedia applications. In this paper, we propose a novel inaccurate 4:2 counter that can effectively reduce the partial product stages of the Wallace Multiplier. Compared to the normal Wallace multiplier, our proposed multiplier can reduce 10.74% of power consumption and 9.8% of delay on average, with an error rate from 0.2% to 13.76% The accuracy of amplitude is higher than 99% In addition, we further enhance the design with error-correction units to provide accurate results. The experimental results show that the extra power consumption of correct units is lower than 6% on average. Compared to the normal Wallace multiplier, the average latency of our proposed multiplier with EDC is 6% faster when the bit-width is 32, and the power consumption is still 10% lower than that of the Wallace multiplier.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130829706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 175
Simulation and architecture improvements of atomic operations on GPU scratchpad memory GPU刮本存储器上原子运算的仿真与架构改进
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657065
Gert-Jan van den Braak, Juan Gómez-Luna, H. Corporaal, José María González-Linares, Nicolás Guil Mata
{"title":"Simulation and architecture improvements of atomic operations on GPU scratchpad memory","authors":"Gert-Jan van den Braak, Juan Gómez-Luna, H. Corporaal, José María González-Linares, Nicolás Guil Mata","doi":"10.1109/ICCD.2013.6657065","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657065","url":null,"abstract":"GPUs are increasingly used as compute accelerators. With a large number of cores executing an even larger number of threads, significant speed-ups can be attained for parallel workloads. Applications that rely on atomic operations, such as histogram and Hough transform, suffer from serialization of threads in case they update the same memory location. Previous work shows that reducing this serialization with software techniques can increase performance by an order of magnitude. We observe, however, that some serialization remains and still slows down these applications. Therefore, this paper proposes to use a hash function in both the addressing of the banks and the locks of the scratchpad memory. To measure the effects of these changes, we first implement a detailed model of atomic operations on scratchpad memory in GPGPU-Sim, and verify its correctness. Second, we test our proposed hardware changes. They result in a speed-up up to 4.9× and 1.8× on implementations utilizing the aforementioned software techniques for histogram and Hough transform applications respectively, with minimum hardware costs.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130665992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Scalable trace signal selection using machine learning 可扩展的跟踪信号选择使用机器学习
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657069
Kamran Rahmani, P. Mishra, S. Ray
{"title":"Scalable trace signal selection using machine learning","authors":"Kamran Rahmani, P. Mishra, S. Ray","doi":"10.1109/ICCD.2013.6657069","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657069","url":null,"abstract":"A key problem in post-silicon validation is to identify a small set of traceable signals that are effective for debug during silicon execution. Structural analysis used by traditional signal selection techniques leads to poor restoration quality. In contrast, simulation-based selection techniques provide superior restorability but incur significant computation overhead. In this paper, we propose an efficient signal selection technique using machine learning to take advantage of simulation-based signal selection while significantly reducing the simulation overhead. Our approach uses (1) bounded mock simulations to generate training vectors set for the machine learning technique, and (2) an elimination approach to identify the most profitable signals set. Experimental results indicate that our approach can improve restorability by up to 63.3% (17.2% on average) with a faster or comparable runtime.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133011207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
FreshCache: Statically and dynamically exploiting dataless ways FreshCache:静态和动态地利用无数据的方式
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657055
Arkaprava Basu, Derek Hower, M. Hill, M. Swift
{"title":"FreshCache: Statically and dynamically exploiting dataless ways","authors":"Arkaprava Basu, Derek Hower, M. Hill, M. Swift","doi":"10.1109/ICCD.2013.6657055","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657055","url":null,"abstract":"Last level caches (LLCs) account for a substantial fraction of the area and power budget in many modern processors. Two recent trends - dwindling die yield that falls off sharply with larger chips and increasing static power - make a strong case for a fresh look at LLC design. Inclusive caches are particularly interesting because many commercially successful processors use inclusion to ease coherence at a cost of some data being stale or redundant. Prior works have demonstrated that LLC designs could be improved through static (at design time) or dynamic (at runtime) use of “dataless ways”. The static dataless ways removes the data-but not tags-from some cache ways to save energy and area without complicating inclusive-LLC coherence. A dynamic version (dynamic dataless ways) could dynamically turn off data, but not tags, effectively adapting the classic selective cache ways idea to save energy in LLC but not area. We find that (a) all our benchmarks benefit from dataless ways, but (b) the best number of dataless ways varies by workload. Thus, a pure static dataless design leaves energy-saving opportunity on the table, while a pure dynamic dataless design misses area-saving opportunity. To surpass both pure static and dynamic approaches, we develop the FreshCache LLC design that both statically and dynamically exploits dataless ways, including a predictor to adapt the number of dynamic dataless ways as well as detailed cache management policies. Results show that FreshCache saves more energy than static dataless ways alone (e.g., 72% vs. 9% of LLC) and more area by dynamic dataless ways only (e.g., 8% vs. 0% of LLC).","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130055063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Gate delay modeling for pre- and post-silicon timing related tasks for ultra-low power CMOS circuits 超低功耗CMOS电路前置和后硅定时相关任务的门延迟建模
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657047
P. Das, S. Gupta
{"title":"Gate delay modeling for pre- and post-silicon timing related tasks for ultra-low power CMOS circuits","authors":"P. Das, S. Gupta","doi":"10.1109/ICCD.2013.6657047","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657047","url":null,"abstract":"Power is increasingly the primary design constraint for chip designers and one of the main techniques for addressing this concern is aggressive voltage scaling. Device variability increases with voltage scaling and significantly affects gate delays at low voltages. Although existing delay models for near- and sub-threshold circuits capture the effects of variability on gate delays, they do not capture advanced delay phenomenon such as multiple input switching (MIS; also known as near-simultaneous transitions) at inputs of a gate. As a result, most of these gate delay models often grossly underestimate worst case delays, leading to selection of non-critical paths and generation of delay-inferior vectors for post-silicon timing related tasks. In this paper we present extensive experimental results to demonstrate that MIS has significant impact (around 30-40%) on delays of near-and sub-threshold nominal gates. We develop our model which guarantees that the minimum and maximum delay values it computes are guaranteed to bound the corresponding delay values in silicon. We show that our model has practical run-time complexity and works equally well for super-, near- and sub-threshold circuits. In particular, via extensive experimentations we show that our model never underestimates the delay and tightly bounds the actual delays. We also illustrate trade-offs between tightness of such bounds, their impact on validation cost, and runtime complexity.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130636051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Rationale for a 3D heterogeneous multi-core processor 3D异构多核处理器的基本原理
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657038
E. Rotenberg, Brandon H. Dwiel, E. Forbes, Zhenqian Zhang, Randy Widialaksono, Rangeen Basu Roy Chowdhury, Nyunyi M. Tshibangu, S. Lipa, W. R. Davis, P. Franzon
{"title":"Rationale for a 3D heterogeneous multi-core processor","authors":"E. Rotenberg, Brandon H. Dwiel, E. Forbes, Zhenqian Zhang, Randy Widialaksono, Rangeen Basu Roy Chowdhury, Nyunyi M. Tshibangu, S. Lipa, W. R. Davis, P. Franzon","doi":"10.1109/ICCD.2013.6657038","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657038","url":null,"abstract":"Single-ISA heterogeneous multi-core processors are comprised of multiple core types that are functionally equivalent but microarchitecturally diverse. This paradigm has gained a lot of attention as a way to optimize performance and energy. As the instruction-level behavior of the currently executing program varies, it is migrated to the most efficient core type for that behavior.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114286205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
LPScan: An algorithm for supply scaling and switching activity minimization during test LPScan:一种在测试期间实现电源缩放和开关活动最小化的算法
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657083
S. Potluri, Satya Trinadh, Chidhambaranathan Rajamanikkam, S. Balachandran
{"title":"LPScan: An algorithm for supply scaling and switching activity minimization during test","authors":"S. Potluri, Satya Trinadh, Chidhambaranathan Rajamanikkam, S. Balachandran","doi":"10.1109/ICCD.2013.6657083","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657083","url":null,"abstract":"Existing low power testing techniques either focus on reducing the switching activity neglecting supply voltage, or perform supply voltage scaling without attempting to minimize switching activity. In this paper we propose LPScan (Low Power Scan), which integrates supply scaling and switching activity reduction in a single framework to reduce test power. For a shift frequency of 125MHz, the LPScan algorithm when applied to circuits from the ISCAS, OpenCores and ITC benchmark suite, produced power savings of 80% in the best case and 50% in the average case, compared to the best known algorithm [1].","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114225994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Energy-efficient Runtime Adaptive Scrubbing in fault-tolerant Network-on-Chips (NoCs) architectures 容错片上网络(noc)架构中的高能效运行时自适应擦洗
2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI: 10.1109/ICCD.2013.6657052
T. Boraten, Avinash Karanth Kodi
{"title":"Energy-efficient Runtime Adaptive Scrubbing in fault-tolerant Network-on-Chips (NoCs) architectures","authors":"T. Boraten, Avinash Karanth Kodi","doi":"10.1109/ICCD.2013.6657052","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657052","url":null,"abstract":"As Networks-on-Chips (NoCs) continue to become more susceptible to process variation, cross-talk, hard and soft errors with technology scaling to sub-nanometer, there is an urgent need for adaptive Error Correction Coding (ECC) schemes for improving the resiliency of the system. The goal of adaptive ECC schemes should be two fold; decrease power consumption when errors are infrequent, thereby maximizing power savings and increase the fault coverage when errors are frequent, thereby improving application speedup while consuming more power. In this paper, we propose Runtime Adaptive Scrubbing (RAS), a novel multi-layered error correction and detection scheme for Networks-on-Chips (NoCs) architectures that intelligently adjusts fault coverage at the physical layer using variable strength encoders to scrub (protect) flits, thereby preventing faults from accumulating and propagating up to the logical layer. RAS successfully permits graceful network degradation while improving the overall network speedup, fault granularity, and wider fault coverage than traditional static schemes. Simulation results indicate that RAS improves network latency by an average of 10% for Splash-2/PARSEC benchmarks on a 8 × 8 mesh network while incurring 6.6% power penalty per flit and saving 15% in area overhead.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"61 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120923251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信