2012 IEEE 30th International Conference on Computer Design (ICCD)最新文献

筛选
英文 中文
Acceleration of Monte-Carlo molecular simulations on hybrid computing architectures 混合计算体系结构上蒙特卡罗分子模拟的加速
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378642
Claus Braun, S. Holst, H. Wunderlich, Juan Manuel Castillo-Sanchez, J. Gross
{"title":"Acceleration of Monte-Carlo molecular simulations on hybrid computing architectures","authors":"Claus Braun, S. Holst, H. Wunderlich, Juan Manuel Castillo-Sanchez, J. Gross","doi":"10.1109/ICCD.2012.6378642","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378642","url":null,"abstract":"Markov-Chain Monte-Carlo (MCMC) methods are an important class of simulation techniques, which execute a sequence of simulation steps, where each new step depends on the previous ones. Due to this fundamental dependency, MCMC methods are inherently hard to parallelize on any architecture. The upcoming generations of hybrid CPU/GPGPU architectures with their multi-core CPUs and tightly coupled many-core GPGPUs provide new acceleration opportunities especially for MCMC methods, if the new degrees of freedom are exploited correctly. In this paper, the outcomes of an interdisciplinary collaboration are presented, which focused on the parallel mapping of a MCMC molecular simulation from thermodynamics to hybrid CPU/GPGPU computing systems. While the mapping is designed for upcoming hybrid architectures, the implementation of this approach on an NVIDIA Tesla system already leads to a substantial speedup of more than 87× despite the additional communication overheads.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133936378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Timing-test scheduling for constraint-graph based post-silicon skew tuning 基于约束图的后硅倾斜调谐时序测试调度
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378679
M. Kaneko
{"title":"Timing-test scheduling for constraint-graph based post-silicon skew tuning","authors":"M. Kaneko","doi":"10.1109/ICCD.2012.6378679","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378679","url":null,"abstract":"Post-Silicon Tuning is an emerging technology for improving performance-yield of VLSIs under process variations. This paper focuses especially on the post-silicon timing-skew tuning (PSST) via programmable delay elements (PDEs), and proposes a novel tuning algorithm which utilizes only the result of setup and hold timing tests, not the result of costly delay-time measurements. The basic framework of our PSST consists of the construction of Control-value Constraint Graph from the results of timing-tests, and the computation of longest path lengths on this graph for finding safe PDE setting. Even though the cost for timing test is smaller than a delay-time measurement, the cost of timing-tests is still a dominant part of the PSST cost, and its reduction is a crucial problem. Longest path lengths which we need to compute depends directly on edge weights in the “longest-paths tree”, but for co-tree edges, their exact edge weights are not always necessary. Based on this observation, we propose timing-test scheduling for reducing the timing-test cost for PDE tuning. The experimental simulation results show that our approach reduces the test cost by almost half or more.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115361907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Thermal characterization of cloud workloads on a power-efficient server-on-chip 在高能效的片上服务器上对云工作负载进行热特性分析
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378637
D. Milojevic, Sachin Idgunji, Djordje Jevdjic, Emre Ozer, P. Lotfi-Kamran, Andreas Panteli, A. Prodromou, C. Nicopoulos, D. Hardy, B. Falsafi, Yiannakis Sazeides
{"title":"Thermal characterization of cloud workloads on a power-efficient server-on-chip","authors":"D. Milojevic, Sachin Idgunji, Djordje Jevdjic, Emre Ozer, P. Lotfi-Kamran, Andreas Panteli, A. Prodromou, C. Nicopoulos, D. Hardy, B. Falsafi, Yiannakis Sazeides","doi":"10.1109/ICCD.2012.6378637","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378637","url":null,"abstract":"We propose a power-efficient many-core server-on-chip system with 3D-stacked Wide I/O DRAM targeting cloud workloads in datacenters. The integration of 3D-stacked Wide I/O DRAM on top of a logic die increases available memory bandwidth by using dense and fast Through-Silicon Vias (TSVs) instead of off-chip IOs, enabling faster data transfers at much lower energy per bit. We demonstrate a methodology that includes full-system microarchitectural modeling and rapid virtual physical prototyping with emphasis on the thermal analysis. Our findings show that while executing CPU-centric benchmarks (e.g. SPECInt and Dhrystone), the temperature in the server-on-chip (logic+DRAM) is in the range of 175-200°C at a power consumption of less than 20W, exceeding the reliable operating bounds without any cooling solutions, even with embedded cores. However, with real cloud workloads, the power density in the server-on-chip remains much below the temperatures reached by the CPU-centric workloads as a result of much lower power burnt by memory-intensive cloud workloads. We show that such a server-on-chip system is feasible with a low-cost passive heat sink eliminating the need for a high-cost active heat sink with an attached fan, creating an opportunity for overall cost and energy savings in datacenters.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115754169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Design and evaluation of a four-port data cache for high instruction level parallelism reconfigurable processors 用于高指令级并行可重构处理器的四端口数据缓存的设计与评价
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378693
Kiyeon Lee, Moo-Kyoung Chung, Soojung Ryu, Yeon-Gon Cho, Sangyeun Cho
{"title":"Design and evaluation of a four-port data cache for high instruction level parallelism reconfigurable processors","authors":"Kiyeon Lee, Moo-Kyoung Chung, Soojung Ryu, Yeon-Gon Cho, Sangyeun Cho","doi":"10.1109/ICCD.2012.6378693","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378693","url":null,"abstract":"This paper explores high-bandwidth data cache designs for a coarse-grained reconfigurable architecture processor family capable of achieving a high degree of instruction level parallelism. To meet stringent power, area and time-to-market constraints, we take an architectural approach rather than circuit-level multi-porting approaches. We closely examine two design choices: single-level banked cache (SLC) and two-level cache (TLC). A detailed simulation study using a set of microbenchmarks and industry-strength benchmarks finds that both SLC and TLC offer a reasonably competitive performance at a small implementation cost compared with a hypothetical cache with perfect ports and a multi-bank scratchpad memory.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125410596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust optimization of a Chip Multiprocessor's performance under power and thermal constraints 芯片多处理器在功率和热约束下性能的稳健优化
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378625
M. Ghasemazar, H. Goudarzi, Massoud Pedram
{"title":"Robust optimization of a Chip Multiprocessor's performance under power and thermal constraints","authors":"M. Ghasemazar, H. Goudarzi, Massoud Pedram","doi":"10.1109/ICCD.2012.6378625","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378625","url":null,"abstract":"Power dissipation and die temperature have become key performance limiters in today's high-performance Chip Multiprocessors (CMPs.) Dynamic power management solutions have been proposed to manage resources in a CMP based on the measured power dissipation, performance, and die temperature of processing cores. In this paper, we develop a robust framework for power and thermal management of heterogeneous CMPs subject to variability and uncertainty in system parameters. More precisely, we first model and formulate the problem of maximizing the task throughput of a heterogeneous CMP (a.k.a., asymmetric multi-core architecture) subject to a total power budget and a per-core temperature limit. Next we develop a solution framework, called Variation-aware Power/Thermal Manager (VPTM), which is a hierarchical dynamic power and thermal management solution targeting heterogeneous CMP architectures. VPTM utilizes dynamic voltage and frequency scaling (DVFS) and core consolidation techniques to control the core power consumptions, which implicitly regulate the core temperatures. An algorithm is proposed for core consolidation and application assignment, and a convex program is formulated and solved to produce optimal DVFS settings. Finally, a feedback controller is employed to compensate for variations in key system parameters at runtime. Experimental results show highly promising performance improvements for VPTM compared to the state-of-the-art techniques.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124063198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Efficient code compression for coarse grained reconfigurable architectures 针对粗粒度可重构架构的高效代码压缩
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378687
Moo-Kyoung Chung, Yeon-Gon Cho, Soojung Ryu
{"title":"Efficient code compression for coarse grained reconfigurable architectures","authors":"Moo-Kyoung Chung, Yeon-Gon Cho, Soojung Ryu","doi":"10.1109/ICCD.2012.6378687","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378687","url":null,"abstract":"Though Coarse Grained Reconfigurable Architecture (CGRA) is a flexible alternative for high performance computing, it has a crucial problem on instruction code whose size is so large that the instruction memory takes a significant portion of silicon area and power consumption. This article proposes an efficient dictionary-based compression method for the CGRA instruction code, where code bit-fields are rearranged and grouped together according to locality characteristics and the most efficient compression mode is selected for each group and kernel. The proposed method can reinstall the dictionary contents adaptively for each kernel. Experimental results show that the proposed method achieved an average compression ratio 0.56 in 4×4 array of function units for well-optimized applications.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123380029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Embedded way prediction for last-level caches 最后一级缓存的嵌入式方式预测
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378636
Faissal M. Sleiman, R. Dreslinski, T. Wenisch
{"title":"Embedded way prediction for last-level caches","authors":"Faissal M. Sleiman, R. Dreslinski, T. Wenisch","doi":"10.1109/ICCD.2012.6378636","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378636","url":null,"abstract":"This paper investigates Embedded Way Prediction for large last-level caches (LLCs): an architecture and circuit design to provide the latency of parallel tag-data access at substantial energy savings. Existing way prediction approaches for L1 caches are compromised by the high associativity and filtered temporal locality of LLCs. We demonstrate: (1) the need for wide partial tag comparison, which we implement with a dynamic CAM alongside the data sub-array wordline decode, and (2) the inhibit bit, an architectural innovation to provide accurate predictions when the partial tag comparison is inconclusive. We present circuit critical-path and architectural power/performance studies demonstrating speedups of up to 15.4% (6.6% average) for scientific and server applications, matching the performance of parallel tag-data access while reducing energy overhead by 40%.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125636321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Analyzing the optimal ratio of SRAM banks in hybrid caches 分析混合缓存中SRAM组的最佳比例
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378655
A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato
{"title":"Analyzing the optimal ratio of SRAM banks in hybrid caches","authors":"A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato","doi":"10.1109/ICCD.2012.6378655","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378655","url":null,"abstract":"Cache memories have been typically implemented with Static Random Access Memory (SRAM) technology. This technology presents a fast access time but high energy consumption and low density. As opposite, the recently appeared embedded Dynamic RAM (eDRAM) technology allows caches to be built with lower energy and area, although with a slower access time. The eDRAM technology provides important leakage and area savings, especially in huge Last-Level Caches (LLCs), which occupy almost half the silicon area in some recent microprocessors. This paper proposes a novel hybrid LLC, which combines SRAM and eDRAM banks to address the trade-off among performance, energy, and area. To this end, we explore the optimal percentage of SRAM and eDRAM banks that achieves the best target trade-off. Architectural mechanisms have been devised to keep the most likely accessed blocks in fast SRAM banks as well as to avoid unnecessary destructive reads. Experimental results show that, compared to a conventional SRAM LLC with the same storage capacity, performance degradation does not surpass, on average, 2.9% (even with 12.5% of banks built with SRAM technology), whereas area savings can be as high as 46% for a 1MB-16way LLC. For a 45nm technology node, the energy-delay squared product confirms that a hybrid cache is a better design than the conventional SRAM cache regardless the number of eDRAM banks, and also better than a conventional eDRAM cache when the number of SRAM banks is a quarter or an eighth of the cache banks.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134347682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Stealth assessment of hardware Trojans in a microcontroller 微控制器中硬件木马的隐身性评估
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378631
Trey Reece, D. Limbrick, Xiaowen Wang, B. Kiddie, W. H. Robinson
{"title":"Stealth assessment of hardware Trojans in a microcontroller","authors":"Trey Reece, D. Limbrick, Xiaowen Wang, B. Kiddie, W. H. Robinson","doi":"10.1109/ICCD.2012.6378631","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378631","url":null,"abstract":"Many experimental hardware Trojans from the literature explore the potential threat vectors, but do not address the stealthiness of the malicious hardware. If a Trojan requires a large amount of area or power, then it can be easier to detect. Instead, a more focused attack can potentially avoid detection. This paper explores the cost in both area and power consumption of several small, focused attacks on an Intel 8051 microcontroller implemented with a standard cell library. The resulting cost in total area varied from a 0.4% increase in the design, down to a 0.150% increase in the design. Dynamic and leakage power showed similar results.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126321549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Memory module-level testing and error behaviors for phase change memory 相变存储器的存储器模块级测试和错误行为
2012 IEEE 30th International Conference on Computer Design (ICCD) Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378664
Zhe Zhang, Weijun Xiao, Nohhyun Park, D. Lilja
{"title":"Memory module-level testing and error behaviors for phase change memory","authors":"Zhe Zhang, Weijun Xiao, Nohhyun Park, D. Lilja","doi":"10.1109/ICCD.2012.6378664","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378664","url":null,"abstract":"Phase change memory (PCM) is a promising technology to solve energy and performance bottlenecks for memory and storage systems. To help understand the reliability characteristics of PCM devices, we present a simple fault model to categorize four types of PCM errors. Based on our proposed fault model, we conduct extensive experiments on real PCM devices at the memory module level. Numerical results uncover many interesting trends in terms of the lifetime of PCM devices and error behaviors. Specifically, PCM lifetime for the memory chips we tested is greater than 14 million cycles, which is much longer than for flash memory devices. In addition, the distributions for four types of errors are quite different. These results can be used for estimating PCM lifetime and for measuring the fabrication quality of individual PCM memory chips.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131460950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信