{"title":"An efficient external-memory implementation of region query with application to area routing","authors":"S. Liao, Narendra V. Shenoy, W. Nicholls","doi":"10.1109/ICCD.2002.1106744","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106744","url":null,"abstract":"We present the tile-cached kd-tree, an efficient external-memory (disk) implementation of two-dimensional region query for use in a detailed area router. Most researchers have heretofore focused on in-memory algorithms. However as the need to tackle very large problems increases, conventional in-memory algorithms suffer from unpredictable caching and paging behavior and their performance may degrade considerably. In addition, since the region-query data structure is only part of the overall system, its consumption of large memory resources affects other parts of the system as well. Our implementation takes advantage of spatial locality in the detailed-routing process. We partition the routing space into tiles, each storing the data of objects (rectangles) that lie strictly within it. Objects that cross tile boundaries are separately stored. The data within a tile are then written out to disk, and a configurable cache is used to hold in memory the most recently visited tiles. Experimental results on large real-life routing problems show that this scheme significantly reduces memory usage with tolerable performance penalty.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132898856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, M. Poncino
{"title":"Legacy SystemC co-simulation of multi-processor systems-on-chip","authors":"L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, M. Poncino","doi":"10.1109/ICCD.2002.1106819","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106819","url":null,"abstract":"We present a co-simulation environment for multiprocessor architectures, that is based on SystemC and allows a transparent integration of instruction set simulators (ISSs) within the SystemC simulation framework. The integration is based on the well-known concept of bus wrapper, that realizes the interface between the ISS and the simulator. The proposed solution uses an ISS-wrapper interface based on the standard gdb remote debugging interface, and implements two alternative schemes that differ in the amount of communication they require. The two approaches provide different degrees of tradeoff between simulation granularity and speed, and show significant speedup with respect to a micro-architectural, full SystemC simulation of the system description.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114338149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Giusto, J. Brunel, A. Ferrari, E. Fourgeau, L. Lavagno, B. O'Rourke, A. Sangiovanni-Vincentelli, Emanuele Guasto
{"title":"Models of IP's for automotive virtual integration platforms","authors":"P. Giusto, J. Brunel, A. Ferrari, E. Fourgeau, L. Lavagno, B. O'Rourke, A. Sangiovanni-Vincentelli, Emanuele Guasto","doi":"10.1109/ICCD.2002.1106797","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106797","url":null,"abstract":"Summary form only given.The concept of virtual integration platform plays a key role in any novel methodology that is trying to address earlier validation of distributed applications in regular and faulty conditions. The methodology must rely upon libraries that model the most important features of the commonly used IP's in the automotive segment such as FlexRay, the emerging bus protocol for safety critical applications supported by BMW, Daimler-Chrysler, Philips, Bosch, and Motorola, OSEK compliant RTOSes and protocol stacks, microprocessors such as Motoro/IBM PowerPC, Infineon 167, NEC v850, Tricore, ST 10, and Janus. We believe that tools must support the easy plug and play of the IP models in a seamless way to the user. For example, it must be possible to run a fast simulation at the token level (frames) to provide insights about the best network protocol configuration within a reasonable accuracy for the estimated frame latency. Next, it must be possible to export such a configuration to (semi)-automatically configure the downstream and more refined bus protocol models for the finer grain validation step. Both steps must rely upon interchangeable IP's with clear interfaces and trade-offs between simulation speed and accuracy of the timing estimates. In this paper, we present two examples of models of IP's that can be used at two different steps in the design exploration, the token-level/cycle approximate transaction based level and the cycle accurate level. The first example is the Universal Communication Model (UCM) that captures the main common features of the most relevant bus protocols such as topology, redundancy, arbitration, etc. The model enables quick token-level simulations. The user is able to determine the communication cycle layout and bus scheduling, k-matrix, and then export it for the configuration of downstream more refined models such as the Motorola FlexRay cycle accurate transaction based model. Bus delays are as important as task execution delays and RTOS switching overheads. In the second example we introduce Janus, a multi-processor micro-controller for power train applications. The cycle approximate transaction based model of Janus can be used to assess the ECU HW/SW partitioning, in particular to quickly explore different task scheduling and allocation. Then, this model is refined and exported to configure a HW/SW co-verification tool for the cycle accurate validation of the ECU HW/SW architecture. In an example scenario, an engine control ECU is providing information about the engine (e.g. engine revolution speed) to a gear control ECU over a CAN bus (the latter typically requires precise revolution speed to operate and could also require to set the engine operation condition). In this scenario, car and subsystem makers play different roles in order to provide a virtual model of the system to validate the functionality and the performance before going to implementation. The same models can then be used to march tow","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127951916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive balanced computing (ABC) microprocessor using reconfigurable functional caches (RFCs)","authors":"Huesung Kim, Arun Kumar Somani, A. Tyagi","doi":"10.1109/ICCD.2002.1106761","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106761","url":null,"abstract":"A general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. If the computing resources are variable to the needs of an application, a better performance can be achieved. Adaptive Balanced Computing (ABC) performs a dynamic resource configuration of on-chip cache memory by converting the cache into a specialized computing unit. With a small amount of additional logic and slightly modified microarchitecture, a part of the cache memory can be configured to perform specialized computations in a conventional processor. In this paper, we evaluate the ABC using RFCs in various cache organizations to see the impact of resource reconfiguration. The simulations with multimedia and DSP applications show that the resource configuration speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129224983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing an asynchronous microcontroller using Pipefitter","authors":"I. Blunno, L. Lavagno","doi":"10.1109/ICCD.2002.1106818","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106818","url":null,"abstract":"This paper discusses how Pipefitter, a tool chain that implements a fully automated synthesis flow for asynchronous circuits, can be used to design a simple asynchronous microcontroller. The use of RTL-like Verilog HDL as the input format makes the first steps of the design flow (i.e. specification and simulation) very easy for the designer. Pipefitter directly synthesizes the control unit as a hazard-free standard cell netlist, uses a genetic algorithm to perform binding and multiplexer optimization for the data path, allows the user to manually specify the binding, and can automatically pipeline a sequential specification. It also produces a synthesizable Verilog specification for the Data Path, as well as a set of scripts driving both its synthesis and timing analysis by state-of-the-art commercial synchronous RTL and logic synthesis tools. The automated insertion of matched delays completes the logic design, and hands off the netlist to the standard cell-based layout tools. The example presented in this paper shows how Pipefitter can be effectively used for the design of asynchronous application specific integrated circuits.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126249251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Serebrin, John Douglas Owens, Chen H. Chen, S. Crago, U. Kapasi, P. Mattson, Jinyung Namkoong, S. Rixner, W. Dally
{"title":"A stream processor development platform","authors":"B. Serebrin, John Douglas Owens, Chen H. Chen, S. Crago, U. Kapasi, P. Mattson, Jinyung Namkoong, S. Rixner, W. Dally","doi":"10.1109/ICCD.2002.1106786","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106786","url":null,"abstract":"We describe a hardware and software platform for developing streaming applications. Programmers write stream programs in high-level languages, and a set of software tools maps these programs to code that runs on a streaming hardware system. The hardware platform includes two Imagine stream processors, together providing 32 GFLOPS peak performance, and a high-speed onboard network to carry video and other data between peripherals and the Imagine processors.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121625585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the impact of technology scaling on mixed PTL/static circuits","authors":"G. Cho, Tom Chen","doi":"10.1109/ICCD.2002.1106789","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106789","url":null,"abstract":"We present the impact of technology scaling on mixed PTL/static circuits and compare the results with that of domino and conventional static CMOS. The state-of-the-art technologies of 0.18 /spl mu/m, 0.13 /spl mu/m, and 0.1 /spl mu/m were used in the study with V/sub dd/ being scaled accordingly. The benchmark suite consists of 10 circuits of varying complexities and they are actual circuits used in a state-of-the-art 64-bit microprocessor in the form of either dynamic or static CMOS circuits. The objective of this work is to determine how performance and power consumption scales with technology scaling. Our experimental results show that the mixed PTL/static circuit style is a promising alternative in power and power-delay product while achieving comparable delay to the dynamic circuit style.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115323947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rita Yu Chen, P. Yip, G. Konstadinidis, and J. N. Demas, F. Klass, Robert E. Mains, M. Schmitt, D. Bistry
{"title":"Timing window applications in UltraSPARC-IIIi/spl trade/ microprocessor design","authors":"Rita Yu Chen, P. Yip, G. Konstadinidis, and J. N. Demas, F. Klass, Robert E. Mains, M. Schmitt, D. Bistry","doi":"10.1109/ICCD.2002.1106764","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106764","url":null,"abstract":"This paper presents two timing window methodologies used in UltraSPARC-IIIi/spl trade/ microprocessor design. They have improved the accuracy of timing and noise analysis. In timing analysis, timing windows are applied to calculate effective Miller factors of coupling nets; in noise analysis, they are applied to waive false noise violations. Results show that by using timing windows in timing analysis, 72% of the CPU-level nets have more accurate Miller factors. Thus, it reduces the number of false timing paths. During the development of this application, a simple and practical convergence rule is defined to stop the iteration. Also, the timing window application on noise analysis has identified 42% of the CPU-level noise violations which can be waived in UltraSPARC-IIIi/spl trade/ chip. This significantly improved the productivity of the design.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122487286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Functional verification of the IBM zSeries eServer z900 system","authors":"Joerg Walter","doi":"10.1109/ICCD.2002.1106741","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106741","url":null,"abstract":"This paper presents an overview on how the zSeries eServer z900 system has been functionally verified. It describes the hierarchical structure of verification, starting with designer simulation, unit-simulation, chip-simulation up to system simulation. For each step, the tools, methods and goals of verification are described. It also presents a description of the IT environment used at the different levels of verification, especially of dedicated simulation hardware like accelerator and emulator machines used for system simulation and hardware/software co-verification.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131503829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Esther Y. Cheng, Feng Zhou, B. Yao, Chung-Kuan Cheng, R. Graham
{"title":"Balancing the interconnect topology for arrays of processors between cost and power","authors":"Esther Y. Cheng, Feng Zhou, B. Yao, Chung-Kuan Cheng, R. Graham","doi":"10.1109/ICCD.2002.1106767","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106767","url":null,"abstract":"High performance SoC requires nonblocking interconnections between an array of processors built on one chip. With the advent of deep sub-micron technologies, switches are becoming much cheaper while wires are still expensive. Therefore, optimization efforts should focus on the wire resources. In this paper, we devise air objective function to balance the interconnect topology between routing area and power dissipation. Based on the objective function, we find the best one-dimensional and two-dimensional nonblocking interconnect architectures. Furthermore, we define a derivative benefit and devise a strategy for improving the performance of hierarchical nonblocking interconnect architectures and derive optimized results.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122663769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}