I. Kadayif, M. Kandemir, N. Vijaykrishnan, M. J. Irwin
{"title":"Hardware-software co-adaptation for data-intensive embedded applications","authors":"I. Kadayif, M. Kandemir, N. Vijaykrishnan, M. J. Irwin","doi":"10.1109/ISVLSI.2002.1016868","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016868","url":null,"abstract":"By studying energy and performance behavior of six array-dominated benchmarks, we observed that each nest in these applications works best with a specific cache configuration and optimization strategy. We also observed that cache configurations and optimization strategies required by different nests are, in general, different from each other. Based on this observation, in this paper, we propose a search space-based optimization for reducing energy consumption and improving performance. Specifically, we study potential benefits of a hardware-software co-adaptation scheme where cache configuration and optimization strategy are modified in the course of execution. Note that this is one step beyond determining just a suitable combination of (optimized) code/cache configuration which is valid throughout the execution of the application. The idea in co-adaptation is to ensure that each nested loop works with a cache configuration most suitable for it from the perspective of a given objective criterion. It should be noted, however, that dynamic cache reconfiguration does not come for free; it has both energy and performance costs which also need to be accounted for.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132867925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temperature variable supply voltage for power reduction","authors":"K. Shakeri, J. Meindl","doi":"10.1109/ISVLSI.2002.1016877","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016877","url":null,"abstract":"The scaling trend of MOSFETs requires the supply and the threshold voltages to be reduced in future generations. Although the supply voltage is reduced, the total power dissipation and the static power of the chip are increased. Power dissipation is one of the limiting factors in achieving the highest performance of a chip. Therefore, new power reduction techniques are required. In this paper a new technique is introduced to reduce the power consumption. In this technique the supply voltage is changed dynamically as temperature changes. Using this technique, for 70 nm devices the total power consumption of the chip can be reduced by 24% and the static power can be reduced by 40%. This reduction is achieved without any change in the worst-case delay.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"30 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133105672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of technology scaling in the clock system power","authors":"D. Duarte, N. Vijaykrishnan, M. J. Irwin","doi":"10.1109/ISVLSI.2002.1016875","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016875","url":null,"abstract":"The clock distribution and generation circuitry is known to consume more than a quarter of the power budget of existing microprocessors. A previously derived clock energy model is briefly reviewed while a comprehensive framework for the estimation of systemwide (chip level) and clock sub-system power as function of technology scaling is presented. This framework is used to study and quantify the impact that various intensifying concerns associated with scaling (i.e., increased leakage currents, increased interwire capacitance) will have on clock energy and their relative impact on the overall system energy. The results obtained indicate that clock power will remain a significant contributor to the total chip power, as long as techniques are used to limit leakage power consumption.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"515 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116211675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient partitioning algorithm of combinational CMOS circuits","authors":"B. Shaer, Khaled Dib","doi":"10.1109/ISVLSI.2002.1016890","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016890","url":null,"abstract":"This paper presents an efficient algorithm to partition combinational CMOS circuits for pseudoexhaustive testing. We present the effect of the partitioning algorithm on critical paths. Our objective is to reduce the delay penalty of test cell insertion for pseudoexhaustive testing. Pseudoexhaustive testing of a combinational circuit involves applying all possible input patterns to test all of its individual cones. Our testing ensures detection of all nonredundant combinational faults. We have developed an optimization process that can be used to find the optimal size of primary input cone (N) and fanout (F) values, to be used for partitioning a given circuit. In our work, the designer can choose between the fewest number of partitioning points and the least delay in critical path. ISCAS'85 benchmark circuits have been successfully partitioned, and when our results are compared to other partitioning methods, our algorithm makes fewer partitions.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115312616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI implementation for MAC-level DWT architecture","authors":"Shiuh-Rong Huang, Lan-Rong Dung","doi":"10.1109/ISVLSI.2002.1016882","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016882","url":null,"abstract":"This paper presents a VLSI design methodology for the MAC-level DWT processor based on a novel limited-resource scheduling (LRS) algorithm. The r-split Fully-specified Signal Flow Graph (FSFG) of the limited-resource FIR filter has been developed for the scheduling of MAC-level DWT signal processing. Given a set of architecture constraints and DWT parameters, the LRS algorithm can generate four scheduling matrices that drive the data path to perform the DWT computation, and the performance has also been investigated. Because the registers of FIR filtering are reused for the inter-octave storage, the MAC-level DWT architecture may require less extra inter-octave memory than the traditional architecture.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126777742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Datapath scheduling using dynamic frequency clocking","authors":"S. Mohanty, N. Ranganathan, V. Krishna","doi":"10.1109/ISVLSI.2002.1016876","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016876","url":null,"abstract":"In this paper, we describe a new datapath scheduling algorithm called DFCS based on the concept of dynamic frequency clocking. In dynamic frequency clocking scheme, all functional units in the datapath are driven by a single clock line that switches frequency dynamically at run time. The algorithm schedules lower frequency operators at earlier steps and delays higher frequency operators to later steps. Next, it regroups some of the higher frequency operators with low frequency operators so as to meet the time constraint. During this phase, DFCS assigns the frequency for each cycle and the functional unit with the corresponding voltage. The algorithm has been applied to various high level synthesis benchmark circuits under different time constraints. The experimental results show that using three supply voltage levels (5.0 V, 3.3 V, 2.4 V) and time constraints ({1.5, 1.75 and 2.01} * the critical path delay), average energy savings in the range of 46% to 68% is obtained with respect to using a single-frequency and single-voltage scheme.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115484364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving structural FSM traversal by constraint-satisfying logic simulation","authors":"Markus Wedler, D. Stoffel, W. Kunz","doi":"10.1109/ISVLSI.2002.1016889","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016889","url":null,"abstract":"We increase the reasoning power of the Record & Play algorithm for structural FSM traversal (Stoffel and Kunz, 1997), by incorporating a constraint-satisfying simulation technique. Combinational verification tools often use simulation to identify candidates for internally equivalent functions. This can significantly reduce the computational costs of proving the equivalence of two circuits. The key idea to improve Record & Play is to perform a random simulation in every time frame that satisfies stored equivalences and constants which are needed to represent the state set. Our experimental results show the benefit of the proposed approach.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115056188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shashi Kumar, A. Jantsch, Mikael Millberg, Johnny Öberg, J. Soininen, M. Forsell, Kari Tiensyrjä, A. Hemani
{"title":"A network on chip architecture and design methodology","authors":"Shashi Kumar, A. Jantsch, Mikael Millberg, Johnny Öberg, J. Soininen, M. Forsell, Kari Tiensyrjä, A. Hemani","doi":"10.1109/ISVLSI.2002.1016885","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016885","url":null,"abstract":"We propose a packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources. The platform, which we call Network-on-Chip (NOC), includes both the architecture and the design methodology. The NOC architecture is a m/spl times/n mesh of switches and resources are placed on the slots formed by the switches. We assume a direct layout of the 2-D mesh of switches and resources providing physical- and architectural-level design integration. Each switch is connected to one resource and four neighboring switches, and each resource is connected to one switch. A resource can be a processor core, memory, an FPGA, a custom hardware block or any other intellectual property (IP) block, which fits into the available slot and complies with the interface of the NOC. The NOC architecture essentially is the onchip communication infrastructure comprising the physical layer, the data link layer and the network layer of the OSI protocol stack. We define the concept of a region, which occupies an area of any number of resources and switches. This concept allows the NOC to accommodate large resources such as large memory banks, FPGA areas, or special purpose computation resources such as high performance multi-processors. The NOC design methodology consists of two phases. In the first phase a concrete architecture is derived from the general NOC template. The concrete architecture defines the number of switches and shape of the network, the kind and shape of regions and the number and kind of resources. The second phase maps the application onto the concrete architecture to form a concrete product.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122324159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous optimization of driving buffer and routing switch sizes in an FPGA using an iso-area approach","authors":"V. Chandra, H. Schmit","doi":"10.1109/ISVLSI.2002.1016870","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016870","url":null,"abstract":"In this paper, we analyze the gain from simultaneous sizing of driving buffers and routing switches on an FPGA interconnect performance. We show that it is not area feasible to build FPGAs with optimally sized interconnects. However, with constrained interconnect area, it is possible to significantly improve the speed of interconnects by simultaneously sizing the driving buffers and routing switches. Our experiments suggest that by simultaneously optimizing the routing resources, delay can be improved by 15-20%. We introduce the idea of iso-area optimization in which we find optimal sizing of routing resources within an overall area constraint. We also show that by making the routing architecture heterogeneous, in terms of routing switch size, we can further improve the performance of an FPGA by 1-12%.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133110685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-output timed Shannon circuits","authors":"M. Thornton, R. Drechsler, D. M. Miller","doi":"10.1109/ISVLSI.2002.1016873","DOIUrl":"https://doi.org/10.1109/ISVLSI.2002.1016873","url":null,"abstract":"Timed Shannon circuits have been proposed as a synthesis approach for a low power optimization technique at the logic level since overall circuit switching probabilities may be reduced. An improvement in the application of this principle for multi-output circuits is presented. Techniques that trade area for power reduction and a method for minimizing the overall circuit switching probability are also included. Experimental results are given and analyzed for these techniques.","PeriodicalId":177982,"journal":{"name":"Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121394296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}