2010 IEEE International Conference on Computer Design最新文献

筛选
英文 中文
Efficient provably good OPC modeling and its applications to interconnect optimization 高效且良好的OPC建模及其在互连优化中的应用
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647713
Shih-Lun Huang, Chung-Wei Lin, Yao-Wen Chang
{"title":"Efficient provably good OPC modeling and its applications to interconnect optimization","authors":"Shih-Lun Huang, Chung-Wei Lin, Yao-Wen Chang","doi":"10.1109/ICCD.2010.5647713","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647713","url":null,"abstract":"Optical Proximity Correction (OPC) is the most popular technique to handle design shape distortions arising from subwavelength lithography. Existing OPC models are typically very computationally expensive and thus not efficient to be incorporated for layout optimization. In this paper, we present an efficient, yet sufficiently accurate OPC cost model which can predict the optimal location of a wire segment for OPC optimization and give an upper bound of the interference amount, guaranteeing that the interference amount is never underestimated. Based on this cost model, we propose an OPC-aware wire perturbation algorithm for post-layout interconnect optimization. We show that the effects of wire perturbation have the concavity or monotonicity property which can dramatically reduce the search space for finding the optimal location of each wire for OPC optimization. Further, we can incrementally update the OPC cost of a wire by recomputing only the affected wires because of the property of superposition of our model. Experimental results show that our algorithm can efficiently obtain much better OPC results than a state-of-the-art OPC-friendly router, based on a leading commercial OPC tool.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"329 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116123330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimal power/performance pipelining for error resilient processors 最优的功率/性能流水线的错误弹性处理器
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647702
Nicolas Zea, J. Sartori, Ben Ahrens, Rakesh Kumar
{"title":"Optimal power/performance pipelining for error resilient processors","authors":"Nicolas Zea, J. Sartori, Ben Ahrens, Rakesh Kumar","doi":"10.1109/ICCD.2010.5647702","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647702","url":null,"abstract":"Timing speculation has been proposed as a technique for maximizing the energy efficiency of processors with minimal loss in performance. A typical implementation of timing speculation involves speculatively reducing the voltage of a processor to a point where errors are possible but rare, and employing an error recovery mechanism to ensure correct functionality. This allows significant energy savings with a small recovery overhead.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124032912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Helia: Heterogeneous Interconnect for Low Resolution Cache Access in snoop-based chip multiprocessors 基于窥探的芯片多处理器中低分辨率缓存访问的异构互连
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647589
Ali Shafiee, Narges Shahidi, A. Baniasadi
{"title":"Helia: Heterogeneous Interconnect for Low Resolution Cache Access in snoop-based chip multiprocessors","authors":"Ali Shafiee, Narges Shahidi, A. Baniasadi","doi":"10.1109/ICCD.2010.5647589","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647589","url":null,"abstract":"In this work we introduce Heterogeneous Interconnect for Low Resolution Cache Access (Helia). Helia improves energy efficiency in snoop-based chip multiprocessors as it eliminates unnecessary activities in both interconnect and cache. This is achieved by using innovative snoop filtering mechanisms coupled with wire management techniques. Our optimizations rely on the observation that a high percentage of cache mismatches could be detected by utilizing a small subset but highly informative portion of the tag bits. Helia relies on the snoop controller to detect possible remote tag mismatches prior to tag array lookup. Power is reduced as a) our wire management techniques permit slow transmission of a subset of tag bits while tag mismatches are being detected and b) we avoid cache access for mismatches detected at the snoop controller. Our Evaluation shows that Helia reduces power in interconnect (dynamic: 64% to 75%, static: 45% to 50%) and cache tag array (dynamic: 57% to 58%, static: 80%) while improving average performance up to 4.4%.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126527430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Design and implementation of a special purpose embedded system for neural machine interface 神经机器接口专用嵌入式系统的设计与实现
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647801
Xiaorong Zhang, H. Huang, Qing Yang
{"title":"Design and implementation of a special purpose embedded system for neural machine interface","authors":"Xiaorong Zhang, H. Huang, Qing Yang","doi":"10.1109/ICCD.2010.5647801","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647801","url":null,"abstract":"Our previous study has shown the potential of using a computer system to accurately decode electromyographic (EMG) signals for neural controlled artificial legs. Because of computation complexity of the training algorithm coupled with real time requirement of controlling artificial legs, traditional embedded systems generally cannot be directly applied to the system. This paper presents a new design of an FPGA-based neural-machine interface for artificial legs. Both the training algorithm and the real time controlling algorithm are implemented on an FPGA. A soft processor built on the FPGA is used to manage hardware components and direct data flows. The implementation and evaluation of this design are based on Altera Stratix II GX EP2SGX90 FPGA device on a PCI Express development board. Our performance evaluations indicate that a speedup of around 280X can be achieved over our previous software implementation with no sacrifice of computation accuracy. The results demonstrate the feasibility of a self-contained, low power, and high performance real-time neural-machine interface for artificial legs.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127514558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Threads vs. caches: Modeling the behavior of parallel workloads 线程与缓存:模拟并行工作负载的行为
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647747
Zvika Guz, O. Itzhak, I. Keidar, A. Kolodny, A. Mendelson, U. Weiser
{"title":"Threads vs. caches: Modeling the behavior of parallel workloads","authors":"Zvika Guz, O. Itzhak, I. Keidar, A. Kolodny, A. Mendelson, U. Weiser","doi":"10.1109/ICCD.2010.5647747","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647747","url":null,"abstract":"A new generation of high-performance engines now combine graphics-oriented parallel processors with a cache architecture. In order to meet this new trend, new highly-parallel workloads are being developed. However, it is often difficult to predict how a given application would perform on a given architecture. This paper provides a new model capturing the behavior of such parallel workloads on different multi-core architectures. Specifically, we provide a simple analytical model, which, for a given application, describes its performance and power as a function of the number of threads it runs in parallel, on a range of architectures. We use our model (backed by simulations) to study both synthetic workloads and real ones from the PARSEC suite. Our findings recognize distinctly different behavior patterns for different application families and architectures.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132596453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
LMS-based low-complexity game workload prediction for DVFS 基于lms的DVFS低复杂度游戏工作负荷预测
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647675
Benedikt Dietrich, S. Nunna, Dip Goswami, S. Chakraborty, M. Gries
{"title":"LMS-based low-complexity game workload prediction for DVFS","authors":"Benedikt Dietrich, S. Nunna, Dip Goswami, S. Chakraborty, M. Gries","doi":"10.1109/ICCD.2010.5647675","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647675","url":null,"abstract":"While dynamic voltage and frequency scaling (DVFS) based power management has been widely studied for video processing, there is very little work on game power management. Recent work on proportional-integral-derivative (PID) controllers fro predicting game workload used hand-turned PID controller gains on relatively short game plays. This left open questions on the robustness of the PID controller and how sensitive the prediction quality is on the choice of the gain values, especially for long game plays involving different scenarios and scene changes. In this paper we propose a Least Mean Squares (LMS) Linear Predictor, which is a regression model commonly used for system parameter identification. Our results show that game workload variation can be estimated using a linear-in-parameters (LIP) model. This observation dramatically reduces the complexity of parameter estimation as the LMS Linear Predictor learns the relevant parameters of the model iteratively as the game progresses. The only parameter to be tuned by the system designer is the learning rate, which is relatively straightforward. Our experimental results using the LMS Linear Predictor show comparable power savings and game quality with those obtained from a highly-tuned PID controller.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131176025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
SWIFT: A SWing-reduced interconnect for a Token-based Network-on-Chip in 90nm CMOS SWIFT:用于90纳米CMOS中基于令牌的片上网络的swing减少互连
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647666
T. Krishna, J. Postman, Christopher Edmonds, L. Peh, P. Chiang
{"title":"SWIFT: A SWing-reduced interconnect for a Token-based Network-on-Chip in 90nm CMOS","authors":"T. Krishna, J. Postman, Christopher Edmonds, L. Peh, P. Chiang","doi":"10.1109/ICCD.2010.5647666","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647666","url":null,"abstract":"With the advent of chip multi-processors (CMPs), on-chip networks are critical for providing low-power communications that scale to high core counts. With this motivation, we present a 64-bit, 8×8 mesh Network-on-Chip in 90nm CMOS that: a) bypasses flit buffering in routers using Token Flow Control, thereby reducing buffer power along the control path, and b) uses low-voltage-swing crossbars and links to reduce interconnect energy in the data path. These approaches enable 38% power savings and 39% latency reduction, when compared with an equivalent baseline network. An experimental 2×2 core prototype, operating at 400 MHz, validates our design.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132428501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Towards cool and reliable digital systems: RT level CED techniques with runtime adaptability 走向酷而可靠的数字系统:具有运行时适应性的RT级CED技术
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647625
Yu Liu, Kaijie Wu
{"title":"Towards cool and reliable digital systems: RT level CED techniques with runtime adaptability","authors":"Yu Liu, Kaijie Wu","doi":"10.1109/ICCD.2010.5647625","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647625","url":null,"abstract":"In response to the rising fault susceptibility of ICs due to aggressive device scaling, a number of concurrent error detection (CED) techniques have been proposed. Most existing techniques address the problem at device or logic level. To account for the significant process variations and device aging of today's nano-meter devices, these techniques must always aim at the worst case of fault susceptibility. Recognizing that the power consumption of the CED circuitry for different fault susceptibility varies significantly, these techniques could result in significant overhead. In this paper, we propose register transfer level CED techniques that can be adjusted at runtime according to the actual need. The proposed high-level synthesis technique ensures that the generated datapath consumes minimal power for any CED capability it has been turned to. The proposed approach is tested using known benchmarks.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134523426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bandwidth optimization in asynchronous NoCs by customizing link wire length 通过自定义链路长度来优化异步noc中的带宽
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647660
Junbok You, Daniel Gebhardt, K. Stevens
{"title":"Bandwidth optimization in asynchronous NoCs by customizing link wire length","authors":"Junbok You, Daniel Gebhardt, K. Stevens","doi":"10.1109/ICCD.2010.5647660","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647660","url":null,"abstract":"The bandwidth requirement for each link on a network-on-chip (NoC) may differ based on topology and traffic properties of the IP cores. Available bandwidth on an asynchronous NoC link will also vary depending on the wire length between sender and receiver. We explore the benefit to NoC performance when this property is used to increase bandwidth on specific links that carry the most traffic of an SoC design. Two methods are used to accomplish this: specifying router locations on the floorplan, and adding pipeline latches on long links. Energy and latency characteristics of an asynchronous NoC are compared to a similarly-designed synchronous NoC. The results indicate that the asynchronous network has lower energy, and link-specific bandwidth optimization has improved the average packet latency. Adding pipeline latches to congested links yields the most improvement. This link-specific optimization is applicable not only to the router and network we present here, but any asynchronous NoC used in a heterogeneous SoC.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133981198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A voting-based working set assessment scheme for dynamic cache resizing mechanisms 一种基于投票的动态缓存调整机制的工作集评估方案
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647599
Masayuki Sato, Ryusuke Egawa, H. Takizawa, Hiroaki Kobayashi
{"title":"A voting-based working set assessment scheme for dynamic cache resizing mechanisms","authors":"Masayuki Sato, Ryusuke Egawa, H. Takizawa, Hiroaki Kobayashi","doi":"10.1109/ICCD.2010.5647599","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647599","url":null,"abstract":"Considering the trade-off between performance and power consumption has become significantly important in multi-core processor design. Under this situation, one promising approach is to employ a power-aware dynamic cache partitioning mechanism. This mechanism individually manages activation of each cache way, and exclusively allocates the minimum number of required ways to each thread. In the mechanism, an appropriate number of ways for a thread is decided based on locality assessment. However, sampling results of cache accesses that are used for locality assessment are disturbed by exceptional behaviors of cache accesses, which happen in a very short period. Such sampling results may change locality assessment results to ones that are not along with the overall trend in a long access-sampling period. These assessment results will excessively adapt the cache to exceptional behaviors, and deteriorate energy efficiency. To avoid such excessive adaptation by the exceptional behaviors, this paper proposes a voting-based working set assessment scheme, in which the number of activated ways is adjusted based on majority voting of locality assessment of several short sampling periods. By using the majority voting, the proposed scheme can identify the periods including exceptional behaviors, and ignore the assessment results of these periods. As a result, the proposed scheme makes the cache resizing mechanism more stable and robust. The experimental results indicate that the proposed scheme can reduce energy consumption by up to 24%, and 10% on an average without significant performance degradation in multi-thread execution on a 2-core CMP.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121313551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信