2008 IEEE International Conference on Computer Design最新文献

筛选
英文 中文
Application Specific Instruction set processor specialized for block motion estimation 用于块运动估计的专用指令集处理器
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751872
Marc-André Daigneault, J. Langlois, J. David
{"title":"Application Specific Instruction set processor specialized for block motion estimation","authors":"Marc-André Daigneault, J. Langlois, J. David","doi":"10.1109/ICCD.2008.4751872","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751872","url":null,"abstract":"This paper presents a novel application specific instruction set processor specialized for block motion estimation. The proposed architecture includes an efficient register file system in terms of data reuse and parallel processing. Performances and area costs are presented for different levels of parallelism and register file dimensions. Various FPGA implementations of the architecture are further studied in order to present the most important factors affecting performance and hardware resource utilization. The proposed instruction extension block architecture enables acceleration by 3 orders of magnitude for full-search block matching algorithms.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134630833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Quantifying the energy efficiency of coordinated micro-architectural adaptation for multimedia workloads 量化多媒体工作负载的协调微架构适应的能源效率
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751920
Shrirang M. Yardi, M. Hsiao
{"title":"Quantifying the energy efficiency of coordinated micro-architectural adaptation for multimedia workloads","authors":"Shrirang M. Yardi, M. Hsiao","doi":"10.1109/ICCD.2008.4751920","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751920","url":null,"abstract":"Adaptive micro-architectures aim to achieve greater energy efficiency by dynamically allocating computing resources to match the workload performance. The decisions of when to adapt (temporal dimension) and what to adapt (spatial dimension) are taken by a control algorithm based on an analysis of the power/performance tradeoffs in both dimensions. We perform a rigorous analysis to quantify the energy efficiency limits of fine-grained temporal and coordinated spatial adaptation of multiple architectural resources by casting the control algorithm as a constrained optimization problem. Our study indicates that coordinated adaptation can potentially improve energy efficiency by up to 60% as compared to static architectures and by up to 33% over algorithms that adapt resources in isolation. We also analyze synergistic application of coarse and fine grained adaptation and find modest improvements of up to 18% over optimized dynamic voltage/frequency scaling. Finally, we analyze several previous control algorithms to understand the underlying reasons for their inefficiency.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115680189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Custom rotary clock router 定制旋转时钟路由器
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751849
V. Honkote, B. Taskin
{"title":"Custom rotary clock router","authors":"V. Honkote, B. Taskin","doi":"10.1109/ICCD.2008.4751849","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751849","url":null,"abstract":"Timing closure and power envelopes for contemporary multi-core chips with high speed clock networks make the clock distribution design a challenging task. Resonant rotary clocking is a novel clocking technology for multi-gigahertz rate clock generation that provides minimal power dissipation. Rotary clocking implementations can easily provide independent synchronization of multiple cores as well. The traditional rotary clock design involves a regular array topology of oscillatory rings. In this paper, the rotary clock networks are designed and implemented using a custom ring topology. Custom ring topologies are advantageous as they reduce the total tapping wirelength for the registers tapping onto the oscillatory rings. A maze router based algorithm is developed for the implementation of custom topology rotary rings. In experiments performed on UCLA IBM R1-R5 benchmark circuits with the Elmore delay model, an improvement of 11.04% for register tapping wirelength is achieved on average.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124721054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Fault tolerant Four-State Logic by using Self-Healing Cells 基于自愈细胞的容错四态逻辑
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751832
T. Panhofer, W. Friesenbichler, M. Delvai
{"title":"Fault tolerant Four-State Logic by using Self-Healing Cells","authors":"T. Panhofer, W. Friesenbichler, M. Delvai","doi":"10.1109/ICCD.2008.4751832","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751832","url":null,"abstract":"The trend towards higher integration and faster operating speed leads to decreasing feature sizes and lower supply voltages in modern integrated circuits. These properties make the circuits more error-prone, requiring a fault tolerant implementation for applications demanding high reliability, e.g. space missions. In previous work we presented a concept how to obtain fault tolerant digital circuits by using asynchronous four-state logic (FSL). This type of logic already exhibits a high degree of fault tolerance where most faults simply halt the circuit (deadlock). The remaining types of faults are handled by temporal redundancy. Adding a deadlock detection unit and introducing the concept of self-healing cells (SHCs) leads to a highly reliable circuit that is able to tolerate even multiple faults. However our experiments revealed that some specific fault constellations neither cause a deadlock nor are they detected by a redundant calculation. We present two improved ways of error detection, which allow to capture even these types of faults. Further, a comparison between the size of an SHC and the achieved fault tolerance wrt. multiple faults is performed.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123934769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Design of application-specific 3D Networks-on-Chip architectures 设计特定应用的3D片上网络架构
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751853
Shan Yan, Bill Lin
{"title":"Design of application-specific 3D Networks-on-Chip architectures","authors":"Shan Yan, Bill Lin","doi":"10.1109/ICCD.2008.4751853","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751853","url":null,"abstract":"The increasing viability of three dimensional (3D) silicon integration technology has opened new opportunities for chip design innovations, including the prospect of extending emerging systems-on-chip (SoC) design paradigms based on networks-on-chip (NoC) interconnection architectures to 3D chip designs. In this paper, we consider the problem of designing application-specific 3D-NoC architectures that are optimized for a given application. We present novel 3D-NoC synthesis algorithms that make use of accurate power and delay models for 3D wiring with through-silicon vias. In particular, we present a very efficient 3D-NoC synthesis algorithm called ripup-reroute-and-router-merging (RRRM), that is based on a rip-up and reroute formulation for routing flows and a router merging procedure for network optimization. Experimental results on 3D-NoC design cases show that our synthesis results can on average achieve a 74% reduction in power consumption and a 17% reduction in hop count over regular 3D mesh implementations and a 52% reduction in power consumption and a 17% reduction in hop count over optimized 3D mesh implementations.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124170662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Safe clocking register assignment in datapath synthesis 数据路径合成中的安全时钟寄存器分配
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751850
Keisuke Inoue, M. Kaneko, T. Iwagaki
{"title":"Safe clocking register assignment in datapath synthesis","authors":"Keisuke Inoue, M. Kaneko, T. Iwagaki","doi":"10.1109/ICCD.2008.4751850","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751850","url":null,"abstract":"For recent and future nanometer-technology VLSIs, static and dynamic delay variations become a serious problem. In many cases, the hold constraint, as well as the setup constraint, becomes critical for latching a correct signal under delay variations. While the timing violation due to the fail of the setup constraint can be fixed by tuning a clock frequency or using a delayed latch, the timing violation due to the fail of the hold constraint cannot be fixed by those methods in general. Our approach to delay variations (in particular, the hold constraint) proposed in this paper is a novel register assignment strategy in high-level synthesis, which guarantees safe clocking by contra-data-direction (CDD) clocking. After the formulation of this new register assignment problem, we prove NP-hardness of the problem, and then derive an integer linear programming formulation for the problem. The proposed method receives a scheduled data flow graph, and generates a datapath having (1) robustness against delay variations, which is ensured by CDD-based register assignment, and (2) the minimum possible number of registers. Experimental results show the effectiveness of the proposed method for some benchmark circuits.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125373118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Applying speculation techniques to implement functional units 运用推测技术来实现功能单元
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751843
Alberto A. Del Barrio, M. Molina, J. Mendias, Esther Andres Perez, R. Hermida, F. Tirado
{"title":"Applying speculation techniques to implement functional units","authors":"Alberto A. Del Barrio, M. Molina, J. Mendias, Esther Andres Perez, R. Hermida, F. Tirado","doi":"10.1109/ICCD.2008.4751843","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751843","url":null,"abstract":"This paper justifies the use of estimation and prediction of carries to increase the performance of functional units built with the replication of full adders while keeping a low area penalization. Adders and multipliers are the most representative modules in this group of functional units. The use of these design techniques allows the implementation of modules with performance improvements ranging from 20% to 50% with only an area overheads around 5%. These functional units are suitable for asynchronous circuits but they could also be introduced in synchronous circuits with speculative techniques. The basic idea consists in estimating the carry out from some parts of the functional units, allowing every part to operate independently and in parallel. These modules are connected to build bigger ones. Results from simulations show that for some applications it is possible to make predictions even more accurate that the bit-based estimation. Predictions have also the advantage they can be introduced in the multipliers design, whether estimators cannot. These predictions are similar to the ones used in the branch prediction in a processor.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125878468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Leveraging speculative architectures for run-time program validation 利用推测性架构进行运行时程序验证
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1145/2512456
Juan Carlos Martínez Santos, Yunsi Fei
{"title":"Leveraging speculative architectures for run-time program validation","authors":"Juan Carlos Martínez Santos, Yunsi Fei","doi":"10.1145/2512456","DOIUrl":"https://doi.org/10.1145/2512456","url":null,"abstract":"Program execution can be tampered by malicious attackers through exploiting software vulnerabilities. Changing the program behavior by compromising control data and decision data has become the most serious threat to computer systems security. Although several hardware approaches have been presented to validate program execution, they mostly suffer great hardware area or poor ambiguity handling. In this paper, we propose a new hardware-based approach by leveraging the existing speculative architectures for run-time program validation. The on-chip branch target buffer (BTB) is utilized as a cache of the legitimate control flow transfers stored in a secure memory region. In addition, the BTB is extended to store the correct program path information. At each indirect branch site, the BTB is used to validate the decision history of conditional branches before it, and more information about the future decision path is fetched to monitor the execution path at run-time. Implementation of this approach is transparent to the upper operating system and programs. Thus, it is applicable to legacy code. Due to good code locality of the executable programs and effectiveness of branch prediction, the frequency of run-time control flow validations against the secure off-chip memory is low. Our experimental results show a negligible performance penalty and small storage overhead with ambiguity reduced.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129455481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
In-field NoC-based SoC testing with distributed test vector storage 现场基于noc的SoC测试与分布式测试向量存储
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751863
J. Lee, R. Mahapatra
{"title":"In-field NoC-based SoC testing with distributed test vector storage","authors":"J. Lee, R. Mahapatra","doi":"10.1109/ICCD.2008.4751863","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751863","url":null,"abstract":"The operational lifetimes of SoC and microprocessors face growing threats from technology scaling and increasing device temperature and power density. In-field (or on-line) testing of NoC-based SoC is an important technique in ensuring system integrity throughout this potentially shorter lifetime. Whether in-field testing is conducted concurrently with normal applications or executed in isolation, application intrusion must be minimized in order to maintain system availability. Specialized infrastructure IP have been proposed to manage on-line testing by scheduling tests and delivering test vectors to the various cores within the SoC from a centralized location. However, as the number of cores integrated into a single chip continues to increase, issuing test vectors from a centralized location is not a scalable solution. These increased distances that test vectors must travel have become a major concern for on-line testing because of its direct impact on application intrusion in terms of energy consumption, network load, and latency. In this paper, we apply a distributed storage technique to bound and minimize this distance, thereby minimizing network load, energy consumption, and test delivery latency across the entire network. Our experiments show that test delivery latency and energy consumption is reduced by approximately 90% for moderately sized NoC.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124496628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Suitable cache organizations for a novel biomedical implant processor 一种新型生物医学植入处理器的合适缓存组织
2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751921
C. Strydis
{"title":"Suitable cache organizations for a novel biomedical implant processor","authors":"C. Strydis","doi":"10.1109/ICCD.2008.4751921","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751921","url":null,"abstract":"This paper evaluates various instruction- and data-cache organizations in terms of performance, power, energy and area on a suitably selected biomedical benchmark suite. The benchmark suite consists of compression, encryption and data-integrity algorithms as well as real implant applications, all executed on biomedical input datasets. Results are used to drive the (micro)architectural design of a novel microprocessor targeting microelectronic implants. Our profiling study has revealed a L1 instruction-cache of 8 KB size (when relaxed area constraints are imposed) and a L1 data-cache of 4 KB size, both structured as 2-way associative caches, as optimal organizations for the envisioned implant processor.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121918192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信