Robert Bai, S. Kulkarni, Wesley Kwong, A. Srivastava, D. Sylvester, D. Blaauw
{"title":"An implementation of a 32-bit ARM processor using dual power supplies and dual threshold voltages","authors":"Robert Bai, S. Kulkarni, Wesley Kwong, A. Srivastava, D. Sylvester, D. Blaauw","doi":"10.1109/ISVLSI.2003.1183366","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183366","url":null,"abstract":"With the explosion of portable electronic devices, power efficient processors have become increasingly important. In this paper we present a set of circuit techniques to implement a 32-bit low-power ARM processor, found commonly in embedded systems, using a six metal layer 0.18 /spl mu/m TSMC process. Our methodology is based on Clustered Voltage Scaling (CVS) and dual-V/sub th/ techniques aiming to reduce both dynamic power and static power simultaneously.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129479857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Leeman, C. Ykman-Couvreur, David Atienza Alonso, V. D. Florio, G. Deconinck
{"title":"Automated dynamic memory data type implementation exploration and optimization","authors":"M. Leeman, C. Ykman-Couvreur, David Atienza Alonso, V. D. Florio, G. Deconinck","doi":"10.1109/ISVLSI.2003.1183476","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183476","url":null,"abstract":"The behavior of many algorithms is heavily determined by the input data. Furthermore, this often means that multiple and completely different execution paths can be followed, also internal data usage and handling is frequently quite different. Therefore, static compile time memory allocation is not efficient, especially on embedded systems where memory is a scarce resource, and dynamic memory management is the only feasible alternative. Including applications with dynamic memory in embedded systems introduces new challenges as compared to traditional signal processing applications. In this paper, an automated framework is presented to optimize embedded applications with extensive use of dynamic memory management. The proposed methodology automates the exploration and identification of optimal data type implementations based on power estimates, memory accesses and normalized memory usage.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130924159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Systolic array implementation of block based Hopfield neural network for pattern association","authors":"Ming-Jung Seow, H. T. Ngo, V. Asari","doi":"10.1109/ISVLSI.2003.1183471","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183471","url":null,"abstract":"This paper suggests a systolic array implementation of block based Hopfield neural network architecture using completely digital circuits. The design is based on rewriting the energy equation of the Hopfield neural network to a systolic (or modular) form. The performance of the proposed architecture is evaluated by applying various binary inputs and it is observed that the network provides massive parallelism and can be extended by cascading identical chips.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129220211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint minimization of power and area in scan testing by scan cell reordering","authors":"Shalini Ghosh, Sugato Basu, N. Touba","doi":"10.1109/ISVLSI.2003.1183485","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183485","url":null,"abstract":"This paper describes a technique for re-ordering of scan cells to minimize power dissipation that is also capable of reducing the area overhead of the circuit compared to a random ordering of the scan cells. For a given test set, our proposed greedy algorithm finds the (locally) optimal scan cell ordering for a given value of /spl lambda/, which is a trade-off parameter that can be used by the designer to specify the relative importance of area overhead minimization and power minimization. The strength of our algorithm lies in the fact that we use a novel dynamic minimum transition fill (MT-fill) technique to fill the unspecified bits in the test vector. Experiments performed on the ISCAS-89 benchmark suite show a reduction in power (70% for s13207, /spl lambda/ = 500) as well as a reduction in layout area (6.72% for s13207, /spl lambda/ = 500).","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120950423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal shielding/spacing metrics for low power design","authors":"Ravishankar Arunachalam, Emrah Acar, S. Nassif","doi":"10.1109/ISVLSI.2003.1183442","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183442","url":null,"abstract":"Noise arising from line-to-line coupling is a major problem for deep submicron design, and present technology trends are causing an increase in this type of noise. Common current methods to decrease coupling noise include shielding and buffering, both of which can increase overall power dissipation. An alternative method is spacing, which has the added benefit of improving the manufacturability (i.e. defect insensitivity) of the design. This paper explores the issue of coupling noise reduction, and proposes performance metrics that can be used by the designer to determine which of the alternative methods is best suited for a specific interconnect configuration.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133725899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A dynamically reconfigurable mixed in-order/out-of-order issue queue for power-aware microprocessors","authors":"Yu Bai, R. I. Bahar","doi":"10.1109/ISVLSI.2003.1183365","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183365","url":null,"abstract":"In this work we focus on power-aware solutions for the issue queue in an out-of-order superscalar processor We propose two different schemes. Our first approach partitions the issue queue into FIFOs such that only the instructions at the head of each FIFO may request to issue. We then dynamically monitor the FIFO usage and disable FIFOs that are not being efficiently used. In our second approach we also use a FIFO scheme, but dynamically vary the number and size of each FIFO simultaneously while at the same time keeping the total number of issue queue entries constant. We analyze both approaches and compare them in terms of the performance and power reduction. We find that although the first scheme of completely disabling issue queue entries is more straight-forward to implement, it may not be the best option, particularly for floating point applications. Our best experimental result shows an average power saving of 27.3% in the issue queue with a performance degradation of only 2.7%.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114769333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fine-grain Phased Logic CPU","authors":"R. Reese, M. Thornton, C. Traver","doi":"10.1109/ISVLSI.2003.1183355","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183355","url":null,"abstract":"A five-stage pipelined CPU based on the MIPs ISA is mapped to a self-timed logic family known as Phased Logic (PL). The mapping is performed automatically from a netlist of D-Flip-Flops and 4-input Lookup Tables (LUT4s) to a netlist of Phased Logic gates. Each PL gate implements a 4-input Lookup Table in addition to control logic required for the PL control scheme. PL offers a speedup technique known as Early Evaluation that can be used to boost performance at the cost of additional PL gates. Several different PL gate-level implementations are produced to explore different architectural tradeoffs using early evaluation. Simulations run for five benchmark programs show an average speedup of 1.48 over the clocked netlist at the cost of 17% additional PL gates.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"131 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125787933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-timed design with dynamic domino circuits","authors":"Jung-Lin Yang, E. Brunvand","doi":"10.1109/ISVLSI.2003.1183473","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183473","url":null,"abstract":"We introduce a simple hierarchical design technique for building high-performance self-timed components using dynamic domino-style circuits. This technique is useful for building handshaking style functional blocks and for self-timed data path components. We wrap the dynamic domino circuit in a wrapper that communicates using a request/acknowledge protocol and mediates the pre-charge/evaluate cycle of the dynamic logic. We apply standard bundled delay matching for completion detection but add an early completion feature that can signal completion if function validity can be determined from the output value. The circuit overhead required for this early-acknowledge feature is relatively small, but can provide measurable speedup in some situations. We call this approach semi-bundled delay (SBD).","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122517807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced techniques for current balanced logic in mixed-signal ICs","authors":"Li Yang, J. Yuan","doi":"10.1109/ISVLSI.2003.1183499","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183499","url":null,"abstract":"In this paper, dual-V/sub T/ and negative feedback are proposed to reduce the noise of the current-balanced logic for mixed-signal ICs. Based on the circuit analysis and SPICE simulation, the dual-V/sub T/ technique shows advantages over the conventional current-balanced logic design in gate area, delay, power dissipation, and switching noise. The negative feedback further reduces the noise spike.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126347163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient VLSI implementation of a VLC decoder for universal variable length code","authors":"Shang Xue, B. Oelmann","doi":"10.1109/ISVLSI.2003.1183467","DOIUrl":"https://doi.org/10.1109/ISVLSI.2003.1183467","url":null,"abstract":"Variable length code (VLC) is used in a large variety of lossless compression applications. A specially designed VLC, called \"Universal Variable Length Code\" (UVLC), is utilized in the latest video coding standard H.26L under development. In this work we develop an efficient decoder for UVLC by utilizing the special properties of UVLC which perform coding in an alternating way (ALT). We compare the ALT decoder with the decoder called \"VLC decoder using plane separation\" (PLS) which is claimed to be one of the most effective VLC decoders. Our results show that the ALT decoder is 1.34 times faster 1.7 times smaller and consumes 45% power in comparison to the PLS decoder.","PeriodicalId":299309,"journal":{"name":"IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127532334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}