Proceedings International Conference on Computer Design VLSI in Computers and Processors最新文献

筛选
英文 中文
Effect of message length and processor speed on the performance of the bidirectional ring-based multiprocessor 消息长度和处理器速度对双向环形多处理器性能的影响
Hitoshi Oi, Nagarajan Ranganathan
{"title":"Effect of message length and processor speed on the performance of the bidirectional ring-based multiprocessor","authors":"Hitoshi Oi, Nagarajan Ranganathan","doi":"10.1109/ICCD.1997.628878","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628878","url":null,"abstract":"This paper presents a comparative study of the performance of the bidirectional ring and the unidirectional ring multiprocessor, with emphasis on the effect of system parameters, specifically, the message length and the relative processor speed. The choice of these parameters may not be optimum due to the performance cost tradeoffs in practice. Our study shows that the use of bidirectional ring is more effective in such suboptimum system configurations and can improve the processor utilization by up to 35%.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130761271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A floating-point divider using redundant binary circuits and an asynchronous clock scheme 一种使用冗余二进制电路和异步时钟方案的浮点除法器
Hiroaki Suzuki, H. Makino, K. Mashiko, H. Hamano
{"title":"A floating-point divider using redundant binary circuits and an asynchronous clock scheme","authors":"Hiroaki Suzuki, H. Makino, K. Mashiko, H. Hamano","doi":"10.1109/ICCD.1997.628939","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628939","url":null,"abstract":"This paper describes a new floating-point divider (FDIV) using redundant binary circuits on an asynchronous clock scheme for an internal iterative operation. The redundant binary representation of +1=(1,0), 0=(0,0), -1+(0,1) is applied to the all mantissa division circuits. The simple and unified representation reduces circuit delay for the quotient determination. Additionally, the asynchronous clock reduces a clock margin overhead. The architecture design avoids post processes, whose main role is to produce the floating-point status flags. The FDIV core using proposed technologies operates at 42.1 ns with 0.35 /spl mu/m CMOS technology and triple metal interconnections. The small core of 13.5 k transistors is laid-out in 730 /spl mu/m/spl times/910 /spl mu/m area.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132648251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Post-layout circuit speed-up by event elimination 通过事件消除实现布局后电路加速
H. Vaishnav, Chi-Keung Lee, Massoud Pedram
{"title":"Post-layout circuit speed-up by event elimination","authors":"H. Vaishnav, Chi-Keung Lee, Massoud Pedram","doi":"10.1109/ICCD.1997.628870","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628870","url":null,"abstract":"We propose a novel technique for post-layout delay optimization. This technique identifies the Boolean space corresponding to late arriving transitions at the outputs of delay-critical subcircuits within the given circuit. The transitions are eliminated from the outputs by implementing the corresponding logic separately and merging them with the original circuit through some control logic. Experimental results suggest that this technique can speed up circuits even when the circuits have already been optimized for delay to the fullest extent.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122002197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Architectural adaptation for application-specific locality optimizations 针对特定于应用程序的局部性优化的体系结构适应
Xingbin Zhang, Ali Dasdan, M. Schulz, Rajesh K. Gupta, A. Chien
{"title":"Architectural adaptation for application-specific locality optimizations","authors":"Xingbin Zhang, Ali Dasdan, M. Schulz, Rajesh K. Gupta, A. Chien","doi":"10.1109/ICCD.1997.628862","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628862","url":null,"abstract":"We propose a machine architecture that integrates programmable logic into key components of the system with the goal of customizing architectural mechanisms and policies to match an application. This approach presents an improvement over the traditional approach of exploiting programmable logic as a separate co-processor by pre-serving machine usability through software and on a traditional computer architecture by providing application-specific hardware. We present two case studies of architectural customization to enhance latency tolerance and efficiently utilize network bisection on multiprocessors for sparse matrix computations. We demonstrate that application-specific hardware and policies can provide substantial improvements in performance on a per application basis. Based on these preliminary results, we propose that an application-driven machine customization provides a promising approach to achieve high performance and combat performance fragility.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125895102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Instruction prefetching using branch prediction information 指令预取使用分支预测信息
I-Cheng K. Chen, Chih-Chieh Lee, T. Mudge
{"title":"Instruction prefetching using branch prediction information","authors":"I-Cheng K. Chen, Chih-Chieh Lee, T. Mudge","doi":"10.1109/ICCD.1997.628926","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628926","url":null,"abstract":"Instruction prefetching can effectively reduce instruction cache misses, thus improving the performance. In this paper, we propose a prefetching scheme, which employs a branch predictor to run ahead of the execution unit and to prefetch potentially useful instructions. Branch prediction-based (BP-based) prefetching has a separate small fetching unit, allowing it to compute and predict targets autonomously. Our simulations show that a 4-issue machine with BP-based prefetching achieves higher performance than a plain cache 4 times the size. In addition, BP-based prefetching outperforms other hardware instruction fetching schemes, such as next-n line prefetching and wrong-path prefetching, by a factor of 17-44% in stall overhead.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127171236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Formally specifying and mechanically verifying programs for the Motorola complex arithmetic processor DSP 正式指定和机械验证程序为摩托罗拉复杂的算术处理器DSP
B. Brock, W. Hunt
{"title":"Formally specifying and mechanically verifying programs for the Motorola complex arithmetic processor DSP","authors":"B. Brock, W. Hunt","doi":"10.1109/ICCD.1997.628846","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628846","url":null,"abstract":"We describe our formal specification of Motorola's Complex Arithmetic Processor (CAP) DSP and our subsequent use of this specification to verify the correctness of several DSP algorithms. We wrote the specification in the ACL2 logic and carried out the mechanical proofs using the ACL2 theorem-proving system. Motorola's CAP is a super-scalar, pipelined DSP with seven memories and more than 20 functional units. Our formal specification is bit-for-bit exact, and was created by hand translating Motorola's drawings for the CAP. We believe that the specification developed is the largest of its kind, as this is the only formal specification of which we are aware for a complete commercial design. Proving the correctness of the DSP algorithms (programs) required proving the correctness of programs with 317-bit instructions and a non-interlocking execution pipeline. This Motorola DSP has a 1.8 million transistor implementation. This project involved both CLI and Motorola personnel and represents more than eight man-years of effort.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129134822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
A low power smart vision system based on active pixel sensor integrated with programmable neural processor 基于有源像素传感器和可编程神经处理器的低功耗智能视觉系统
W. Fang, Guang Yang, B. Pain, B. Sheu
{"title":"A low power smart vision system based on active pixel sensor integrated with programmable neural processor","authors":"W. Fang, Guang Yang, B. Pain, B. Sheu","doi":"10.1109/ICCD.1997.628905","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628905","url":null,"abstract":"A low power smart vision system based on a large format (currently 1 K/spl times/1 K) active pixel sensor (APS) integrated with a programmable neural processor for fast vision applications is presented. The concept of building a low power smart vision system is demonstrated by a system design which is composed with an APS sensor, a smart image window handler, and a neural processor. The paper also shows that it is feasible to put the whole smart vision system into a single chip in a standard CMOS technology. This smart vision system on-a-chip can take the combined advantages of the optics and electronics to achieve ultra-high-speed smart sensory information processing and analysis at the focal plane. The proposed system will enable many applications including robotics and machine vision, guidance and navigation, automotive applications, and consumer electronics. Future applications will also include scientific sensors such as those suitable for highly integrated imaging systems used in NASA deep space and planetary spacecraft.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132071106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Clustering and load balancing for buffered clock tree synthesis 缓冲时钟树合成的集群和负载平衡
A. D. Mehta, Yao-Ping Chen, N. Menezes, D. F. Wong, L. Pileggi
{"title":"Clustering and load balancing for buffered clock tree synthesis","authors":"A. D. Mehta, Yao-Ping Chen, N. Menezes, D. F. Wong, L. Pileggi","doi":"10.1109/ICCD.1997.628871","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628871","url":null,"abstract":"Buffers in clock trees introduce two additional sources of skew: the first source of skew is the effect of process variations on buffer delays. The second source of skew is the imbalance in buffer loading. We propose a buffered clock tree synthesis methodology whereby we first apply a clustering algorithm to obtain clusters of approximately equal capacitance loading. We drive each of these clusters with identical buffers. A sensitivity based approach is then used for equalizing the Elmore delay from the buffer output to all of the clock nodes. The skew due to load imbalance is minimized concurrently by matching a higher-order model of the load by wire sizing and wire lengthening. We demonstrate how this algorithm can be used recursively to generate low-skew buffered clock trees.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"PP 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126419434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
High level test synthesis across the boundary of behavioral and structural domains 跨越行为和结构领域边界的高水平测试综合
Kowen Lai, C. Papachristou, M. Baklashov
{"title":"High level test synthesis across the boundary of behavioral and structural domains","authors":"Kowen Lai, C. Papachristou, M. Baklashov","doi":"10.1109/ICCD.1997.628932","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628932","url":null,"abstract":"High level test synthesis (HLTS), a term introduced in recent years, promises automatic enhancement of testability of a circuit. The authors show how HLTS can achieve higher testability for BIST oriented test methodologies. Their results show considering testability during high-level synthesis, better testability can be obtained when compared to DFT at low level. Transformation for testability, which allows behavioral modification for testability, is a very powerful HLTS technique.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"1036 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123130878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Continuous retiming: algorithms and applications 连续重定时:算法和应用
P. Pan
{"title":"Continuous retiming: algorithms and applications","authors":"P. Pan","doi":"10.1109/ICCD.1997.628857","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628857","url":null,"abstract":"This paper introduces a continuous version of retiming (called c-retiming). As retiming, a c-retiming of a circuit is also an assignment of values to the nodes in the circuit. However, values in c-retiming can be real numbers as opposed to integers in retiming. Retiming and c-retiming are strongly related. In fact, a c-retiming can be converted to a retiming by a simple rounding, and the potential degradation in clock period is less than the largest gate delay in a circuit. C-retiming has two very attractive properties. It can be computed much more efficiently than retiming. Consequently, one can compute a retiming by computing a proper c-retiming. Our experimental results indicate this approach can drastically speed up the solution of retiming problems. More importantly, c-retiming can be combined with circuit modifications. Because of this property, c-retiming can be used as a tool to study synthesis and optimization problems in conjunction with retiming. We demonstrate this using the classical tree mapping problem, for which we derive an algorithm that produces a solution with a clock period provably close to optimal while considering retiming.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114198921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信