{"title":"Reconfigurable latch controllers for low power asynchronous circuits","authors":"M. Lewis, J. Garside, L. Brackenbury","doi":"10.1109/ASYNC.1999.761520","DOIUrl":"https://doi.org/10.1109/ASYNC.1999.761520","url":null,"abstract":"A method for reducing the power consumption in asynchronous micropipeline-based circuits is presented. The method is based around a design for latch controllers in which the operating mode of the pipeline latches (normally open/transparent or normally closed/opaque) can be selected according to the dynamic processing demand on the circuit. Operating in normally-closed mode prevents spurious transitions from propagating along a static pipeline, at the expense of reduced throughput. Tests of the new latch controller circuits on a pipelined multiplier datapath show that reductions in energy per operation of up to 32% can be obtained by changing to the normally-closed operating mode. Estimates suggest that in a typical application which exhibits a variable processing demand, a power reduction of between 16-24% is possible, with little or no impact on maximum throughput.","PeriodicalId":285714,"journal":{"name":"Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127640333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Coates, J. Ebergen, J. Lexau, Scott M. Fairbanks, I. W. Jones, Alex Ridgway, David M. Harris, I. Sutherland
{"title":"A counterflow pipeline experiment","authors":"B. Coates, J. Ebergen, J. Lexau, Scott M. Fairbanks, I. W. Jones, Alex Ridgway, David M. Harris, I. Sutherland","doi":"10.1109/ASYNC.1999.761531","DOIUrl":"https://doi.org/10.1109/ASYNC.1999.761531","url":null,"abstract":"The counterflow pipeline architecture consists of two interacting pipelines in which data items flow in opposite directions. Interactions occur between two items when they meet in a stage. We present the design decisions for, and test measurements from, an asynchronous chip that explores the basic ideas of such an architecture. We built the chip in order to confirm proper operation of the arbiters required to ensure that each and every item flowing in one direction interacts with each and every item flowing in the other direction. Our chip, named \"Zeke,\" was built in 0.6 /spl mu/m CMOS through the MOSIS fabrication facility. The maximum total throughput of the chip, which is the sum of the throughputs of the two pipelines, varies between 491 MDI/s (mega data items per second) and 699 MDI/s, depending on the amount of interaction that takes place. Under average data and operating conditions the performance of our chip was roughly halfway between these throughput values.","PeriodicalId":285714,"journal":{"name":"Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121755597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fast, asP*, RGD arbiter","authors":"M. Greenstreet, Tarik Ono-Tesfaye","doi":"10.1109/ASYNC.1999.761532","DOIUrl":"https://doi.org/10.1109/ASYNC.1999.761532","url":null,"abstract":"This paper presents the design of a high-throughput, low-latency, asP*, RGD arbiter. Spice simulations for an implementation in a 0.8 /spl mu/ CMOS process show a request-to-grant delay of 0.74 ns and a done-to-grant-delay of 0.42 ns. Maximum throughput of requests from a single client is one grant per 1.8 ns; if both clients make request aggressively, the arbiter can produce one grant per 1.2 ns. In addition to presenting a high-performance design, this paper examines trade-offs in performance driven design. In particular, logic delay seems to dominate metastability concerns when optimizing performance.","PeriodicalId":285714,"journal":{"name":"Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127483656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From STG to extended-burst-mode machines","authors":"Jochen Beister, Gernot Eckstein, Ralf Wollowski","doi":"10.1109/ASYNC.1999.761530","DOIUrl":"https://doi.org/10.1109/ASYNC.1999.761530","url":null,"abstract":"A method is presented for deriving a system of parallel extended-burst-mode (XBM) machines from a signal transition graph (STG) specifying required input-output behaviour. First, a primitive finite-state machine is derived as the most general, sequential solution, from which allowable concurrency can still be recognized. Output concurrency is dealt with by decomposition (output partitioning, omission of irrelevant inputs). The component FSMs, with input concurrency only, are tested for XBM feasibility and-if positive-their XBM specifications are constructed. The entire procedure is systematic and is illustrated by deriving two XBM machines from an STG with input and output concurrency. We propose to view the STG as the most general and most precise causal specification of any asynchronous design problem, above and beyond considerations of circuit models and delay assumptions.","PeriodicalId":285714,"journal":{"name":"Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127883545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shai Rotem, K. Stevens, C. Dike, M. Roncken, Borislav Agapiev, R. Ginosar, Rakefet Kol, P. Beerel, C. Myers, K. Yun
{"title":"RAPPID: an asynchronous instruction length decoder","authors":"Shai Rotem, K. Stevens, C. Dike, M. Roncken, Borislav Agapiev, R. Ginosar, Rakefet Kol, P. Beerel, C. Myers, K. Yun","doi":"10.1109/ASYNC.1999.761523","DOIUrl":"https://doi.org/10.1109/ASYNC.1999.761523","url":null,"abstract":"This paper describes an investigation of potential advantages and risks of applying an aggressive asynchronous design methodology to Intel Architecture. RAPPID (\"Revolving Asynchronous Pentium(R) Processor Instruction Decoder\"), a prototype IA32 instruction length decoding and steering unit, was implemented using self-timed techniques. RAPPID chip was fabricated on a 0.25 /spl mu/ CMOS process and tested successfully. Results show significant advantages-in particular, performance of 2.5-4.5 instructions/nS-with manageable risks using this design technology. RAPPID achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as an existing 400 MHz clocked circuit.","PeriodicalId":285714,"journal":{"name":"Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132311351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Taubin, A. Kondratyev, J. Cortadella, L. Lavagno
{"title":"Behavioral transformations to increase noise immunity in asynchronous specifications","authors":"A. Taubin, A. Kondratyev, J. Cortadella, L. Lavagno","doi":"10.1109/ASYNC.1999.761521","DOIUrl":"https://doi.org/10.1109/ASYNC.1999.761521","url":null,"abstract":"Noise immunity is becoming one of the most important design parameters for deep-sub-micron (DSM) technologies. Asynchronous circuits seem to be a good candidate to alleviate the problems originated by simultaneous switching noise. However, they are also more sensitive than synchronous ones to spurious signal transitions and delay variations produced by crosstalk noise. This paper addresses the problem of analyzing and synthesizing asynchronous circuits with noise immunity being the main design parameter. The techniques presented in the paper focus on crosstalk noise and tackle the problem from the behavioral point of view.","PeriodicalId":285714,"journal":{"name":"Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115912242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relative timing","authors":"K. Stevens, Shai Rotem, R. Ginosar","doi":"10.1109/ASYNC.1999.761535","DOIUrl":"https://doi.org/10.1109/ASYNC.1999.761535","url":null,"abstract":"Relative Timing is introduced as an informal method for aggressive asynchronous design. It is demonstrated on three example circuits (C-Element, FIFO, and RAPPID Tag Unit), facilitating transformations from speed-independent circuits to burst-mode, relative timed, and pulse-mode circuits. Relative timing enables improved performance, area, power and testability in all three cases.","PeriodicalId":285714,"journal":{"name":"Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125557051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time merging","authors":"M. Greenstreet","doi":"10.1109/ASYNC.1999.761533","DOIUrl":"https://doi.org/10.1109/ASYNC.1999.761533","url":null,"abstract":"A merge element combines two, concurrent, handshake streams. For every request received from a client, a merge element may send a request to its parent, and for each acknowledgement received from its parent, the merge element may send an acknowledgement to a client. We show that that a merge-element can provide bounded time response if its parent also has bounded time response. We present two new implementations of a merge: one that uses an arbiter, and one that uses Schmitt triggers but no arbiters. Based on these designs, we explore a class of concurrent computations that can be performed in guaranteed bounded time, and we raise some new questions about what is possible in asynchronous design.","PeriodicalId":285714,"journal":{"name":"Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134146213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-phase asynchronous wave-pipelines and their application to a 2D-DCT","authors":"O. Hauck, M. Garg, S. Huss","doi":"10.1109/ASYNC.1999.761536","DOIUrl":"https://doi.org/10.1109/ASYNC.1999.761536","url":null,"abstract":"The two-phase asynchronous wave-pipeline design style presented in this paper is targeted at VLSI systems operating at Giga rates where it is rather difficult and costly to maintain the synchronous paradigm. Its distinguishing properties are the use of a request signal only, simple latches and the inelastic wave-pipelined operation. The asynchronous wave-pipeline is found to have less overhead and to be more robust than the synchronous one. The same basic structure is suitable for both data and control. Buildings blocks of a distributed arithmetic-based 2D-DCT are shown. Simulations of circuits to be fabricated on a 0.6 /spl mu/m CMOS process show throughput rates as high as 800 MHz for the 2D-DCT.","PeriodicalId":285714,"journal":{"name":"Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133787700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis and applications of the XDI model","authors":"W. C. Mallon, J. T. Udding, T. Verhoeff","doi":"10.1109/ASYNC.1999.761537","DOIUrl":"https://doi.org/10.1109/ASYNC.1999.761537","url":null,"abstract":"It is not always straightforward to implement a network that is robust enough to be functionally independent of communication delay. In order to specify and verify so called Delay Insensitive networks, numerous models and formalisms have been developed. In this paper we analyze one of the most expressive models. We show how based on rewrite rules we can compute, rather than invent parts of a network. We implemented these computations in a tool. We also show how healthiness, finite execution models and a distributive parallel composition cannot coexist.","PeriodicalId":285714,"journal":{"name":"Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems","volume":"87 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114135253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}