{"title":"LALM: a logic-aware layout methodology to enhance the noise immunity of domino circuits","authors":"Yonghee Im, K. Roy","doi":"10.1109/ISVLSI.2003.1183352","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183352","url":null,"abstract":"The circuit performance is increasingly affected by signal integrity as cross-talk becomes more significant with scaling down of feature sizes. Many attempts have been made to improve noise immunity, but all require the sacrifice of speed as a tradeoff. In some circuits, P/G network is used as shielding wires to avoid cross-talk while maintaining the desired speed, but the use of the network is inherently restricted by electromigration, IR drop, Ldi/dt noise, etc. We propose a novel methodology at to enhance the noise immunity of domino circuits by reordering transistors as well as interconnects based on the functionality of the circuit. To the best of our knowledge, it is the first attempt to use the functionality of a circuit for the purpose of noise immunity enhancement. The methodology, named \"Logic-Aware Layout Methodology\" (LALM), uses several techniques that can be used to improve the signal integrity of domino circuits. Experimental results show that LALM is simple to apply yet useful in improving the noise immunity of domino circuits.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127174507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconfigurable fast memory management system design for application specific processors","authors":"S. K. Agun, J. M. Chang","doi":"10.1109/ISVLSI.2003.1183358","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183358","url":null,"abstract":"This paper presents the design and implementation of the new active memory manager unit (AMMU) designed to be embedded into system-on-chip CPUs. The unit is implemented using VHDL in field programmable gate array (FPGA) technology. The modified buddy system is used as the hardware algorithm for memory management. A RISC compatible open-source CPU is deployed with the memory management unit to demonstrate the feasibility of implementation. The results indicate that the proposed AMMU achieves high performance in memory allocation and deallocation for software systems.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127732347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie S. Hu, N. Vijaykrishnan, M. J. Irwin, M. Kandemir
{"title":"Using dynamic branch behavior for power-efficient instruction fetch","authors":"Jie S. Hu, N. Vijaykrishnan, M. J. Irwin, M. Kandemir","doi":"10.1109/ISVLSI.2003.1183363","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183363","url":null,"abstract":"Power consumption has become an increasing concern in high performance microprocessor design in terms of packaging and cooling cost. The fetch unit including instruction cache contributes a large portion of the total power consumption in the microprocessor The instruction cache itself suffers some hidden power consumption due to dynamic control flows. Although capturing the dynamic control flows to boost performance, conventional trace caches (CTC) may increase power consumption in the fetch unit due to its simultaneous access to both the trace cache and the instruction cache. By avoiding this simultaneous accesses, sequential trace caches (STC) achieve lower power consumption, but suffer a significant performance loss at the meantime. In this paper we propose dynamic direction prediction based trace cache (DPTC) which avoids simultaneous accesses to the trace cache and the instruction cache with the guide of fetch direction prediction. Experimental results show that dynamic prediction based trace cache can achieve 38.5% power reduction over conventional trace caches and an additional 7.2% reduction over STC, on average, while only trading a 1.8% performance loss compared to CTC.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123381571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Q-Tree: a new iterative improvement approach for buffered interconnect optimization","authors":"A. Kahng, Bao Liu","doi":"10.1109/ISVLSI.2003.1183444","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183444","url":null,"abstract":"The \"chicken-egg\" dilemma between VLSI interconnect timing optimization and delay calculation suggests an iterative approach. We separate interconnect timing transformation as Hanan grafting and non-Hanan sliding, and reveal generally negligible contribution of non-Hanan sliding. We propose a greedy iterative interconnect timing optimization algorithm called Q-Tree. Our experimental results show that Q-Tree starting with Steiner minimum tree topologies achieves better timing performance than C-Tree, PER-Steiner and BA-Tree algorithms. Also, executing Q-Tree starting with BA-Tree or P-Tree topologies can achieve better timing performance, especially, with shorter wires and fewer buffers. In general, Q-Tree can be applied to any interconnect tree for further timing performance improvement, with practical instance sizes and easily-extended functionality - e.g., with buffer station and routing obstacle avoidance consideration.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117079768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Networks-on-chip: the quest for on-chip fault-tolerant communication","authors":"R. Marculescu","doi":"10.1109/ISVLSI.2003.1183347","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183347","url":null,"abstract":"In this paper, we discuss the possibility of achieving on-chip fault-tolerant communication based on a new communication paradigm called stochastic communication. Specifically, for a generic tile-based architecture, we present a randomized algorithm which not only separates computation from communication, but also provides the required fault-tolerance to on-chip failures. This new technique is easy and cheap to implement in SoCs that integrate a large number of communicating IP cores.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131170649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Testable sequential circuit design: partitioning for pseudoexhaustive test","authors":"B. Shaer, K. Aurangabadkar, N. Agarwal","doi":"10.1109/ISVLSI.2003.1183484","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183484","url":null,"abstract":"In this study, we present an automated algorithm that partitions large sequential VLSI circuits for pseudoexhaustive testing. The partitioning algorithm is based on the primary input cone and fanout value of each node in the circuit. We have developed an optimization process that can be used to find the optimal size of primary input cone and fanout values, to be used for partitioning a given circuit. Experimental results are presented to demonstrate the effectiveness of our work.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126823884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An area-efficient Euclidean algorithm block for Reed-Solomon decoder","authors":"Hanho Lee","doi":"10.1109/ISVLSI.2003.1183468","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183468","url":null,"abstract":"This paper presents a new area-efficient architecture to implement the Euclidean algorithm, which is frequently used in Reed-Solomon decoders. The RS (255,239) decoder using the Euclidean algorithm has been implemented with 0.13 /spl mu/m CMOS technology with a supply voltage of 1.1 V. We investigate hardware complexity, clock frequency and data processing rate for this Euclidean algorithm block. The results show that the total number of gates is about 44,700 and it has a data processing rate of 2.4 Gbits/s at a clock frequency of 300 MHz. As compared to the other RS decoders, it gains significant improvements in hardware complexity and latency.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128155476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Block-wise extraction of Rent's exponents for an extensible processor","authors":"T. Ahonen, T. Nurmi, J. Nurmi, J. Isoaho","doi":"10.1109/ISVLSI.2003.1183463","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183463","url":null,"abstract":"It is envisioned that future system-on-chip hardware platform designs will be based on reuse of a customizable processor core. Consequently, being able to quickly evaluate the key performance metrics associated with specific points in the design space becomes essential. Development of an early design phase performance estimation method for logic blocks of an extensible processor core is described. The processor blocks were systematically synthesized with varying constraints for reference and the corresponding Rent's exponents were extracted from the results. The impact of synthesis-originated design space discontinuities on the accuracy of physical performance estimation was evaluated by applying linear regression on the resulting design points.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116613386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. U. Diril, Y. S. Dhillon, Kyu-won Choi, A. Chatterjee
{"title":"An O(N) supply voltage assignment algorithm for low-energy serially connected CMOS modules and a heuristic extension to acyclic data flow graphs","authors":"A. U. Diril, Y. S. Dhillon, Kyu-won Choi, A. Chatterjee","doi":"10.1109/ISVLSI.2003.1183443","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183443","url":null,"abstract":"In this paper, a novel algorithm is proposed for assigning supply voltages to serially executing functional units (FUs) in a digital system such that the overall dynamic energy consumption is minimized for a given timing constraint. Novel closed form expressions for optimum supply voltage values are presented. The computation time of the algorithm is O(N) for N FUs in series. An extension of the O(N) algorithm is proposed for optimizing the acyclic data flow graph associated with any given task. Given the number of FUs available for the task, the operations required for the task are scheduled on the FUs. Voltages are then assigned to the FUs on each path of the flow graph using the O(N) algorithm. Energy savings of 10-60% are achieved on DSP filter designs using the proposed high-level optimization methodology over single supply voltage designs.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128059141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High throughput power-aware FIR filter design based on fine-grain pipelining multipliers and adders","authors":"J. Di, Jiann-Shiun Yuan, R. Demara","doi":"10.1109/ISVLSI.2003.1183490","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183490","url":null,"abstract":"In regular FIR structure, by pipelining the multipliers one can improve the throughput. But as with the growth of operand word length, the delay in addition process becomes another important constraint. In this paper, a novel fine-grain pipelining scheme for high throughput FIR is proposed. By pipelining multipliers and adders, very high throughput can be achieved. 2-dimensional pipeline gating technique is used to make the designed FIR power aware of the precision of the operands. The average power dissipation and latency are both significantly reduced with changing of input precisions.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133875561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}