MICRO 24最新文献_第3页

Architecture and programming of a VLIW style programmable video signal processor VLIW型可编程视频信号处理器的结构与编程

MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123502

G. Essink, E. Aarts, R. V. Dongen, P. V. Gerwen, J. Korst, K. Vissers

引用次数: 8

Increasing user interaction during high-level synthesis 在高级合成过程中增加用户交互

MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123493

R. Walker, Shivkumar Ramabadran, R. Joshi, Steinar Flatland

引用次数: 5

DISC: dynamic instruction stream computer 动态指令流计算机

MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123498

M. Nemirovsky, F. Brewer, R. Wood

引用次数: 28

A new technique for induction variable removal 一种去除感应变量的新技术

MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123501

Haigeng Wang, A. Nicolau, R. Potasman

{"title":"A new technique for induction variable removal","authors":"Haigeng Wang, A. Nicolau, R. Potasman","doi":"10.1145/123465.123501","DOIUrl":"https://doi.org/10.1145/123465.123501","url":null,"abstract":"Removing redundant loop induction variables(IV’s) in a sequential program can improve the code performance by making effective use of registers and reducing the dynamic instruction count in the loop. At the microcode level and in high-performance, fine-grain parallel architectures, it is even more important that a parallelizing compiler is able to remove redundant IV’s generated as a by-product of parallelizing transformations. Conventional IV detection algorithm fails in finding an IV family with no basic IV. Copy propagation in general cannot transform an IV family with no basic IV into a family with a basic IV. As a result, conventional IV removal method would not work for more general types of IV families, which often result from loop parallelizing transformations and also exist in sequential programs. Furthermore, IV removal by copy propagation with loop unrolling cannot preserve the semantic of the original code in addition to its space-inefficiency. We present in this paper a new technique for redundant IV removal. It can remove redundant IV’s from more general types of IV families without an overhead of code size increase, which is inevitably incurred by other methods such aa loop unwinding and copy propagation with node splitting . It can also be used to determine whether redundant IV’s should be removed(i.e., benefits the overall performance). We then demonstrate the effectiveness of this technique using some benchmarks. Pcrmisston to copy without fee all or part of this material is granted pro. vlded that the copies are not made or distributed for direct commerc]a 1 advantage, the ACM copyrtght notms and the tMe of the pubhcation and m date appear, and notice is given that copying is by permission of the Association for Computing Machinety. To copy othetwise, or to repubhsh,requm?s a fee andlor specl!ic permission. O 1991 ACM 0-89791-460-0/91/0011/0172 $1.50 *This work is supported h part by NSF grant CCRS704367 and ONR graut NOO014S6K0215 .","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123219844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Workload and implementation considerations for dynamic base register caching 动态基寄存器缓存的工作负载和实现考虑因素

MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123476

M. Farrens, A. Park

{"title":"Workload and implementation considerations for dynamic base register caching","authors":"M. Farrens, A. Park","doi":"10.1145/123465.123476","DOIUrl":"https://doi.org/10.1145/123465.123476","url":null,"abstract":"Dynamic Base Register Caching (DBRC) [. Farrens Park Compression 1990 .] [. Farrens Park SIGARCH18 1991 .] has been shown to be a useful technique for significantly reducing processor to memory address bandwidth. By caching the higher order portions of memory addresses in a set of dynamically allocated base registers, only small register indices need to be transmitted between the processor and memory instead of the high order address bits themselves. In this paper we present the results of trace driven simulations which indicate that DRBC can facilitate the provision of separate paths for instructions and data by reducing the number of address lines required for parallel address channels. In fact, tailoring DBRC for separate instruction and data streams results in superior address compression. We also show that the effectiveness of DBRC is not significantly degraded by multiprogramming workload, for large Spec benchmark traces. Additionally, we suggest two methods to optimize DBRC implementation. (1) A processor’s translation lookaside buffer hardware can be modified to implement DBRC in addition to its normal address translation functions. (2) DBRC latency can be hidden by properly synchronizing it with memory chip address pin multiplexing.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131674068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

ALPS: an algorithm for pipeline data path synthesis ALPS:一种管道数据路径综合算法

MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123490

R. Karri, A. Orailoglu

{"title":"ALPS: an algorithm for pipeline data path synthesis","authors":"R. Karri, A. Orailoglu","doi":"10.1145/123465.123490","DOIUrl":"https://doi.org/10.1145/123465.123490","url":null,"abstract":"While techniques for design of high performance computing systems have been well understood, software mechanisms for the automatic design of high performance application specific integrated circuits (ASICS) remain relatively u nexplored. Advances in levels of integration will make it feasible to support performance-enhancing structures on a single chip. With the increasing demand for high performance in real-time signal processing applications, the design of high speed ASICS merits immediate attention. In this paper, we develop software mechanisms for the high-level synthesis of high-performance VLSI systems. We have extended our interactive behavioral synthesis framework that provides scheduling with multiple constraints including performance and cost, to support scheduling for high-performance. The system is powerful enough to allow trade-offs along mnltiple dimensions. The software mechanisms to support highperformance include a pipeline scheduler, ALPS, that suppol ts constraints including performance and cost. ALPS is a polynomial time algorithm. Experimental results have shown that (a) ALPS consistently synthesizes designs on the optimal-designs curve, (b) it can be used for rapid prototypiug as well as for detailed synthesis, and (c) the interplay between performance and cost results in a rich set of design alternatives.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132291610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4