2010 IEEE International Conference on Computer Design最新文献

筛选
英文 中文
Adaptive TDMA bus allocation and elastic scheduling: A unified approach for enhancing robustness in multi-core RT systems 自适应TDMA总线分配和弹性调度:一种增强多核RT系统鲁棒性的统一方法
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647792
P. Burgio, M. Ruggiero, Francesco Esposito, Mauro Marinoni, G. Buttazzo, L. Benini
{"title":"Adaptive TDMA bus allocation and elastic scheduling: A unified approach for enhancing robustness in multi-core RT systems","authors":"P. Burgio, M. Ruggiero, Francesco Esposito, Mauro Marinoni, G. Buttazzo, L. Benini","doi":"10.1109/ICCD.2010.5647792","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647792","url":null,"abstract":"Next-generation real-time systems will be increasingly based on heterogeneous MPSoC design paradigms, where predictability and performance will be key issues to deal with. Such issues can be tackled both at the hardware level, by embedding technologies such as TDMA busses, and at the OS level, where suitable scheduling techniques can improve performance and reduce energy consumption. Among these, elastic scheduling has been proved to provide satisfactory results by dynamically reducing task periods at run-time to ensure the highest utilization possible of the processors. On the other hand, elastic scheduling lowers the degree of predictability and increases the complexity of the analysis at the system level. This reduces the benefits given by the TDMA bus, which relies on the high level task analysis for a robust and efficient slot allocation. Starting from this consideration, we propose a system where the elastic scheduling and the TDMA bus work synergistically. We introduce a QoS-aware adaptive bus service which takes the best of both techniques, mitigating their drawbacks at the same time. We show how the overhead introduced by coordination action is small, and it is however dominated by the benefits of the overall strategy in terms of performance and predictability guarantees.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131815450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Routability-driven flip-flop merging process for clock power reduction 时钟功耗降低的可达性驱动触发器合并过程
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647784
Zhi-Wei Chen, Jin-Tai Yan
{"title":"Routability-driven flip-flop merging process for clock power reduction","authors":"Zhi-Wei Chen, Jin-Tai Yan","doi":"10.1109/ICCD.2010.5647784","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647784","url":null,"abstract":"The concept of merging some 1-bit flip-flops into a multi-bit flip-flop is applied to reduce dynamic clock power and decrease the total flip-flop area in a synchronous design. To acquire these advantages, the design must be guaranteed to satisfy certain physical constraints in the merging process. In this paper, given a set of 1-bit flip-flops with the input and output timing constraints, the area constraint inside any partitioned bin and the capacity constraint on any bin edge in a placement plane, an efficient routability-driven approach is proposed to merge 1-bit flip-flops into some multi-bit flip-flops for clock power reduction. The experimental results show that our proposed approach reduces 37.4% of the flip-flop area to maintain the synchronous design and saves 24.82% of the clock power for five examples in reasonable CPU time on the average.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133033920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
EQUIPE: Parallel equivalence checking with GP-GPUs 设备:与gp - gpu并行等效性检查
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647645
Debapriya Chatterjee, V. Bertacco
{"title":"EQUIPE: Parallel equivalence checking with GP-GPUs","authors":"Debapriya Chatterjee, V. Bertacco","doi":"10.1109/ICCD.2010.5647645","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647645","url":null,"abstract":"Combinational equivalence checking (CEC) is a mainstream application in Electronic Design Automation used to determine the equivalence between two combinational netlists. Tools performing CEC are widely deployed in the design flow to determine the correctness of synthesis transformations and optimizations. One of the main limitations of these tools is their scalability, as industrial scale designs demand time-consuming computation. In this work we propose EQUIPE, a novel combinational equivalence checking solution, which leverages the massive parallelism of modern general purpose graphic processing units. EQUIPE reduces the need for hard-to-parallelize engines, such as BDDs and SAT, by taking advantage of algorithms well-suited to concurrent implementation. We found experimentally that EQUIPE outperforms commercial CEC tools by an order of magnitude, on average, and state-of-the-art research CEC solutions by up to a factor of three, on a wide range of industry-strength designs.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115470582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Using variable clocking to reduce leakage in synchronous circuits 采用可变时钟减少同步电路的漏电
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647716
Navid Toosizadeh, S. Zaky, Jianwen Zhu
{"title":"Using variable clocking to reduce leakage in synchronous circuits","authors":"Navid Toosizadeh, S. Zaky, Jianwen Zhu","doi":"10.1109/ICCD.2010.5647716","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647716","url":null,"abstract":"There is a growing demand for high-performance, low-power systems, particularly in portable devices. New approaches to design are needed in technologies with feature sizes of 90 nm and below to reduce leakage power and to deal with process variations, which force designers to use increasingly conservative delay estimations. This paper presents a variable clock generator for a conventionally-designed synchronous circuit core. The clock frequency adjusts automatically to inter-and intra-chip process, voltage and temperature variations, making it possible to design the circuit assuming typical rather than worst-case conditions. The resulting circuit uses much fewer high-speed, low-voltage-threshold cells, and consequently has significantly reduced leakage power. Post-layout test results on a 32-bit microprocessor implemented in 90-nm technology showed 10X less leakage and 19% less dynamic power when operating under typical conditions, compared to a conventional, fixed-frequency implementation. The system is functional under all PVT corners.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117207737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient MIMD architectures for high-performance ray tracing 高效的MIMD架构,用于高性能光线跟踪
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647555
D. Kopta, J. Spjut, E. Brunvand, A. Davis
{"title":"Efficient MIMD architectures for high-performance ray tracing","authors":"D. Kopta, J. Spjut, E. Brunvand, A. Davis","doi":"10.1109/ICCD.2010.5647555","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647555","url":null,"abstract":"Ray tracing efficiently models complex illumination effects to improve visual realism in computer graphics. Typical modern GPUs use wide SIMD processing, and have achieved impressive performance for a variety of graphics processing including ray tracing. However, SIMD efficiency can be reduced due to the divergent branching and memory access patterns that are common in ray tracing codes. This paper explores an alternative approach using MIMD processing cores custom-designed for ray tracing. By relaxing the requirement that instruction paths be synchronized as in SIMD, caches and less frequently used area expensive functional units may be more effectively shared. Heavy resource sharing provides significant area savings while still maintaining a high MIMD issue rate from our numerous light-weight cores. This paper explores the design space of this architecture and compares performance to the best reported results for a GPU ray tracer and a parallel ray tracer using general purpose cores. We show an overall performance that is six to ten times higher in a similar die area.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128026408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
DDPSL: An easy way of defining properties DDPSL:一种定义属性的简单方法
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647654
L. D. Guglielmo, F. Fummi, Nicola Orlandi, G. Pravadelli
{"title":"DDPSL: An easy way of defining properties","authors":"L. D. Guglielmo, F. Fummi, Nicola Orlandi, G. Pravadelli","doi":"10.1109/ICCD.2010.5647654","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647654","url":null,"abstract":"The paper proposes DDPSL (Drag and Drop PSL) a template library and a tool which simplifies the definition of PSL (Property Specification Language) formal properties by exploiting PSL-based templates. DDPSL allows users not expert in formal methods to define PSL properties by dragging and dropping logical and temporal operators, and variables from the design under verification (DUV) into predefined templates. Moreover, confident users or experts can extend the set of templates, reducing the effort required for formalizing complex properties. From the methodological point of view, DDPSL combines the advantages of both Open Verification Library (OVL) and PSL. Note that the templates are characterized by a parametric interface that separates the formal definition from its semantics, as provided by OVL. Moreover, the adoption of PSL as reference language guarantees the expressiveness of popular temporal logics such as Linear Temporal Logic (LTL) and Computational Tree Logic (CTL), which, on the contrary, are not fully supported by OVL. DDPSL has been successfully used to define properties for verifying an embedded application running on the microcontroller of an industrial oven.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132018765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Skew-aware capacitive load balancing for low-power zero clock skew rotary oscillatory array 低功耗零时钟偏斜旋转振荡阵列的容性负载平衡
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647781
V. Honkote, B. Taskin
{"title":"Skew-aware capacitive load balancing for low-power zero clock skew rotary oscillatory array","authors":"V. Honkote, B. Taskin","doi":"10.1109/ICCD.2010.5647781","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647781","url":null,"abstract":"Rotary clocking is a traveling wave based high-speed resonant clocking technology with low-power and controllable-skew properties. Capacitive load balance and bounded clock skew are identified as the primary requirements to maintain a stable oscillation frequency across the rings and to achieve timing closure, respectively, in the rotary oscillatory array (ROA). Towards this end, two methodologies are proposed to achieve balanced capacitive loads across the rings of the ROA with a bounded skew constraint. Experiments performed on IBM R1–R5 benchmark circuits show a 5.62X improved capacitive balance and a 3.67% improved clock skew to a total skew of 6.55% of the clock period at 1.8GHz. SPICE simulations show that the frequency variation across the rings of the ROA is reduced from 10.14% to 2.12% as well. Power dissipated with the proposed optimization methodologies are within ±1.5% of the conventional design automation techniques for rotary synchronization.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127455809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
QoS scheduling for NoCs: Strict Priority Queueing versus Weighted Round Robin noc的QoS调度:严格优先级队列与加权轮循
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647577
Yue Qian, Zhonghai Lu, Q. Dou
{"title":"QoS scheduling for NoCs: Strict Priority Queueing versus Weighted Round Robin","authors":"Yue Qian, Zhonghai Lu, Q. Dou","doi":"10.1109/ICCD.2010.5647577","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647577","url":null,"abstract":"Strict Priority Queueing (SPQ) andWeighted Round Robin (WRR) are two common scheduling techniques to achieve Quality-of-Service (QoS) while using shared resources. Based on network calculus, we build analytical models for traffic flows under SPQ and WRR scheduling in on-chip wormhole networks. With these models, we can derive per-flow end-to-end delay bound. We compare the service behavior and show that WRR is not only more fair but also more flexible for QoS provision. To exhibit the potential and flexibility enabled by WRR, we develop a weight allocation algorithm to automatically assign proper weights for individual flows to satisfy their delay constraints. In particular, the weights are assigned in a way not more than necessary, in other words, to approach flows' delay constraints in order to leave room for other flows. Our experimental results validate our analysis technique and algorithms.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128620171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Inter-socket victim cacheing for platform power reduction 套接字间受害者缓存平台功率降低
2010 IEEE International Conference on Computer Design Pub Date : 2010-11-29 DOI: 10.1109/ICCD.2010.5647634
Subhra Mazumdar, D. Tullsen, Justin J. Song
{"title":"Inter-socket victim cacheing for platform power reduction","authors":"Subhra Mazumdar, D. Tullsen, Justin J. Song","doi":"10.1109/ICCD.2010.5647634","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647634","url":null,"abstract":"On a multi-socket architecture with load below peak, as is often the case in a server installation, it is common to consolidate load onto fewer sockets to save processor power. However, this can increase main memory power consumption due to the decreased total cache space. This paper describes inter-socket victim cacheing, a technique that enables such a system to do both load consolidation and cache aggregation at the same time. It uses the last level cache of an idle processor in a connected socket as a victim cache, holding evicted data from the active processor. This enables expensive main memory accesses to be replaced by cheaper cache hits. This work examines both static and dynamic victim cache management policies. Energy savings is as high as 32.5%, and averages 5.8%.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129186864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automotive embedded driver assistance: A real-time low-power FPGA stereo engine using semi-global matching 汽车嵌入式驱动辅助:采用半全局匹配的实时低功耗FPGA立体声引擎
2010 IEEE International Conference on Computer Design Pub Date : 2010-10-01 DOI: 10.1109/ICCD.2010.5647552
Felix Eberli
{"title":"Automotive embedded driver assistance: A real-time low-power FPGA stereo engine using semi-global matching","authors":"Felix Eberli","doi":"10.1109/ICCD.2010.5647552","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647552","url":null,"abstract":"Today, automotive driver assistant systems are a growing market. After an overview on current driver assistant systems we will focus on vision based systems and their special requirements. As an example project we will describe the development of a next generation stereo vision algorithm.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125080069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信