{"title":"Speedup of self-timed digital systems using Early Completion","authors":"S. Smith","doi":"10.1109/ISVLSI.2002.1016884","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016884","url":null,"abstract":"An Early Completion technique is developed to significantly increase the throughput of NULL Convention self-timed digital systems without impacting latency or compromising their self-timed nature. Early Completion performs the completion detection for registration stage/sub i/ at the input of the register, instead of at the output of the register as in standard NULL Convention Logic. This method requires that the single-rail completion signal from registration stage/sub i+1/, Ko/sub i+1/, be used as an additional input to the completion detection circuitry for registration stage/sub i/, to maintain self-timed operation. However, Early Completion does necessitate an assumption of equipotential regions, introducing a few easily satisfiable timing assumptions, thus making the design potentially more delay-sensitive. To illustrate the technique, Early Completion is applied to a case study of an optimally pipelined 4-bit by 4-bit unsigned multiplier utilizing full-word completion, where a speedup of 1.21 is achieved while self-timed operation is maintained and latency remains unchanged.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130692907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI systems for embedded video","authors":"W. Wolf, I. Özer, T. Lv","doi":"10.1109/ISVLSI.2002.1016865","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016865","url":null,"abstract":"This paper describes our work in smart cameras as a driver application for very deep submicron VLSI systems. A smart camera uses on-board computing engines to analyze video in real time. In particular, we use algorithms for realtime human gesture recognition as an example of the sorts of next-generation video applications that will be implemented in future VLSI systems. This paper introduces smart camera applications and outlines some of the VLSI challenges posed by such systems.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125895683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-performance field programmable VLSI processor based on a direct allocation of a control/data flow graph","authors":"Naotaka Ohsawa, M. Hariyama, M. Kameyama","doi":"10.1109/ISVLSI.2002.1016881","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016881","url":null,"abstract":"This paper proposes a high-performance field programmable VLSI processor (FPVLSI), in which a bit-serial processing element (PE) array is introduced to reduce the complexity of programmable interconnection networks. Therefore, the area and delay of a switch block in the interconnection network can be greatly reduced. Moreover, direct allocation of a control/data flow graph is employed where only a single node is mapped into a PE so that the wiring complexity is greatly reduced. The FPVLSI with 4400 PEs is designed in a 0.35 /spl mu/m CMOS process. The performance of the FPVLSI is evaluated to be 28 times higher than that of the typical FPGA when executing the 16-point FFT.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133207669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating retiming under the coupled-edge timing model","authors":"I. Neumann, K. Sulimma, W. Kunz","doi":"10.1109/ISVLSI.2002.1016887","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016887","url":null,"abstract":"Retiming has been shown to be a powerful technique for improving the performance of synchronous circuits. However, even though retiming algorithms of polynomial time complexity have been developed the runtimes still may become prohibitively long for large circuits. For the original FEAS algorithm proposed by Leiserson and Saxe (1983,1991), acceleration techniques have been developed solving this problem in practice. However, FEAS uses a simple circuit model being fairly inaccurate for gate level net lists mapped onto actual technologies. Recently a retiming algorithm FEAS/spl I.bar/CTM based on a new timing model tackling this problem has been proposed. In this paper we present a technique for speeding up execution time of FEAS/spl I.bar/CTM. This technique is also suitable for a variety of published algorithms based on the circuit model proposed by Soyata and Friedman (1994,1997). In this work the approach has been integrated into FEAS/spl I.bar/CTM and its benefit has been proven by experimental results.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129161245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"System design and power optimization for mobile computers","authors":"A. Smailagic, M. Ettus","doi":"10.1109/ISVLSI.2002.1016867","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016867","url":null,"abstract":"The paper presents a system level design approach to power optimization in mobile/wearable computers. The paper identifies the major components of power consumption in a mobile system, and evaluates their respective contributions to power consumption, focusing on the impact of wireless networking. An experimental evaluation of several techniques for improving energy efficiency of a mobile system is presented.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128104138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Bruce, M. Thornton, L. Shivakumaraiah, P. S. Kokate, X. Li
{"title":"Efficient adder circuits based on a conservative reversible logic gate","authors":"J. Bruce, M. Thornton, L. Shivakumaraiah, P. S. Kokate, X. Li","doi":"10.1109/ISVLSI.2002.1016879","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016879","url":null,"abstract":"Conservative and reversible logic gates are widely known to be compatible with revolutionary computing paradigms such as optical and quantum computing. A fundamental conservative reversible logic gate is the Fredkin gate. This paper presents efficient adder circuits based on the Fredkin gate. Novel full adder circuits using Fredkin gates air proposed which have lower hardware complexity than the current state-of-the-art, while generating the additional signals required for carry skip adder architectures. The traditional ripple carry adder and several carry skip adder topologies are compared. Theoretical performance of each adder is determined and compared. Although the variable sized block carry skip adder is determined to have shorter delay than the fixed block size carry skip adder, the performance gains are not sufficient to warrant the required additional hardware complexity.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130427545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 1.2 V built-in architecture for high frequency on-line Iddq/delta Iddq test","authors":"S. Dragic, M. Margala","doi":"10.1109/ISVLSI.2002.1016891","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016891","url":null,"abstract":"A novel low-voltage design of Iddq/delta Iddq architecture suitable for a Built-In-Self-Test (BIST) implementation with analog, digital or mixed-signal cores is proposed. In testing mode, the architecture performs a non-functional Iddq and delta Iddq test which enables a more accurate fail/pass decision. A 1.2 V high-frequency current amplifying cell is developed as a central part of the Iddq/delta Iddq current monitor. With a sensitivity of less than 200 nA, the monitor achieves a gain-bandwidth product of 6.8 GHz, a low frequency current gain of 48 dB, and a high linearity for input current range (-15 /spl mu/A, 15 /spl mu/A). Its functionality and high performances are verified in experimental simulations. The Iddq fault detector has been implemented in a 0.13 /spl mu/m CMOS technology with 1.2 V power supply.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131675064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Noise tolerant low power dynamic TSPCL D flip-flops","authors":"M. Elgamel, T. Darwish, M. Bayoumi","doi":"10.1109/ISVLSI.2002.1016880","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016880","url":null,"abstract":"The extensive use of a dynamic circuit techniques for higher performance has already been implemented in many circuits like microprocessors. With the scaling down to deep submicron technology and the move towards dynamic circuit techniques, noise immunity is becoming an important metric like power, speed, and area. This paper proposes a technique to achieve low energy consumption in TSPCL D flip-flops. The paper studies some published flip-flops and carries out a modification that reduces the switching activity of some internal nodes, causing a big saving in power consumption. The proposed flip-flop is characterized and compared with those published ones for reliability and energy efficiency. Comparison for speed, power consumption, and noise tolerance is also presented. The average noise threshold energy (ANTE) and the energy normalized ANTE metrics are used for quantifying the noise immunity and energy efficiency, respectively of flip-flops. Results using 0.18 /spl mu/m CMOS technology and HSPICE for simulation, show that the proposed TSPCL D flip-flop achieves reduction in power dissipation ranging from 4.6% to 80% depending on the input pattern and the technology in use. The noise immunization curves show that the modified flip-flop is more susceptible to noise. Hence, one of the known noise immunization techniques should be applied.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133760098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Force-directed scheduling for dynamic power optimization","authors":"Suvodeep Gupta, S. Katkoori","doi":"10.1109/ISVLSI.2002.1016878","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016878","url":null,"abstract":"We present a latency-constrained scheduling algorithm to optimize a design for dynamic power Usage of forces to model power is motivated by the force-directed scheduling (FDS) heuristic proposed by Paulin and Knight (1989). Given a dataflow graph (DFG) and an input data environment, we profile the DFG with representative data streams. Our algorithm reduces dynamic power by reducing switched capacitance inside resources. The switched capacitance of combinations among DFG operations, which could share a resource, and the probability of selecting such a combination, are evaluated. Switched capacitance inside a module is modeled as the spring constant k and probability of selecting the corresponding combination is modeled as the displacement x, in the force equation F=kx. Thus, a force is associated with each feasible combination corresponding to its power cost. Due to numerous possibilities, we obtain a distribution of forces whose mean, standard deviation, and skew are used to make a power-optimal scheduling decision. Compared to original FDS, our algorithm shows average power savings of 16.4% for the same throughput at the cost of a nominal area overhead.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130753775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal timing for skew-tolerant high-speed domino logic","authors":"Seong-ook Jung, Ki-Wook Kim, S. Kang","doi":"10.1109/ISVLSI.2002.1016871","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016871","url":null,"abstract":"When low threshold voltage (V/sub t/) is applied to domino logic to improve the performance, the tradeoff between performance and noise margin is a major design issue. To resolve the tradeoff we propose Skew-Tolerant High-Speed (STHS) domino logic, which incorporates a dual keeper structure and delay logic gates. Detailed timing analysis of STHS domino logic induces optimal timing conditions wherein contention-free skew-tolerant window is maximized. We show that dual keeper structure increases innate noise-tolerance, and clock delay control logic fortifies signal skew-tolerance. Simulation results show that STHS domino logic is more robust to noise and signal skew than High-Speed (HS) domino logic, while presenting better performance and power efficiency.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}