M. Eireiner, D. Schmitt-Landsiedel, P. Wallner, Andreas Schöne, S. Henzler, U. Fiedler
{"title":"Adaptive circuit block model for power supply noise analysis of low power system-on-chip","authors":"M. Eireiner, D. Schmitt-Landsiedel, P. Wallner, Andreas Schöne, S. Henzler, U. Fiedler","doi":"10.1109/SOCC.2009.5335686","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335686","url":null,"abstract":"A circuit block model and methodology for accurate power supply noise analysis, taking the impact of power supply noise on the current consumption into account, is presented. This enables high transient accuracy even at excessive power supply noise. Further improvement is obtained by an adaptive model for the capacitance of switching gates. Simulations for various power grids and test circuits are compared between a state of the art and the improved modelling. Simulation error of power supply noise was reduced by 4.7X - 20X at a simulation run time penalty of roughly 20%. This makes it especially helpful for low power SoC designs, with high transient IR-drop and multi-frequency domains, where transient accuracy is of concern.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131014470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Milojevic, R. Radojcic, Roger Carpenter, P. Marchal
{"title":"Pathfinding: A design methodology for fast exploration and optimisation of 3D-stacked integrated circuits","authors":"D. Milojevic, R. Radojcic, Roger Carpenter, P. Marchal","doi":"10.1109/SOCC.2009.5335663","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335663","url":null,"abstract":"This paper introduces new design methodology and the corresponding EDA tool chain enabling fast design space exploration and high fidelity of results for emerging heterogeneous 3D-Stacked Integrated Circuits. The proposed framework allows designers to easily trade-off between different system level design choices (e.g. functional partitioning), physical design options (e.g. packaging strategies) and/or technology options (e.g. different technology nodes) and understand their impact on typical design parameters such as cost, performance and power. We demonstrate the proposed framework using existing MPSoC for video coding applications. The system is virtually prototyped as traditional 2D and then 3D design. For a 3D version we place the off-chip DRAM memory on the top of the processing die, and consider different packaging options. For different implementation scenarios we quantify typical design parameters showing the benefits of the 3D integration.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123151601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. P. Vayá, J. Martín-Langerwerf, F. Giesemann, H. Blume, P. Pirsch
{"title":"Instruction merging to increase parallelism in VLIW architectures","authors":"G. P. Vayá, J. Martín-Langerwerf, F. Giesemann, H. Blume, P. Pirsch","doi":"10.1109/SOCC.2009.5335660","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335660","url":null,"abstract":"This paper describes a new mechanism for concurrent use of more functional units, without increasing the control path of a generic VLIW architecture. The proposed approach only requires small modifications in the architecture and a new code selection function in the instruction scheduler. The key idea of this approach is to search for similar independent operations inside a basic assembler code block and merge them in a single instruction, which executes the same operation with even and odd operand registers in two different functional units. A comprehensive evaluation of this mechanism with two multimedia tasks shows an improvement of the dynamic instructions-per-cycle, exceeding the theoretical maximum of the reference architecture.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"91 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132679183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Lattuada, C. Pilato, Antonino Tumeo, Fabrizio Ferrandi
{"title":"Performance modeling of parallel applications on MPSoCs","authors":"M. Lattuada, C. Pilato, Antonino Tumeo, Fabrizio Ferrandi","doi":"10.1109/SOCC.2009.5335675","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335675","url":null,"abstract":"In this paper we present a new technique for automatically measuring the performance of tasks, functions or arbitrary parts of a program on a multiprocessor embedded system. The technique instruments the tasks described by OpenMP, used to represent the task parallelism, while ad hoc pragmas in the source indicate other pieces of code to profile. The annotations and the instrumentation are completely target-independent, so the same code can be measured on different target architectures, on simulators or on prototypes. We validate the approach on a single and on a dual LEON 3 platform synthesized on FPGA, demonstrating a low instrumentation overhead. We show how the information obtained with this technique can be easily exploited in a Hardware/Software design space exploration tool, by estimating, with good accuracy, the speed-up of a parallel application given the profiling on the single processor prototype.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126226242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two phase clocked adiabatic static CMOS logic","authors":"Nazrul Anuar, Yasuhiro Takahashi, T. Sekine","doi":"10.1109/SOCC.2009.5335671","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335671","url":null,"abstract":"This paper demonstrates the low-energy operation of a two-phase clocked adiabatic static CMOS logic (2PASCL) on the basis of the results obtained in the simulation of a 4-bit ripple-carry adder (RCA) and D-flipflop employing 2PASCL circuit technology. Two-phase unsymmetrical power supply clocks are introduced to increase the logic transition level. Energy dissipation in the unsymmetrical clocked 2PASCL RCA and D-flipflop are 77.2% and 55.5% less than that in a static CMOS at transition frequencies of 10–100 MHz respectively.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116514442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault-tolerant communication over Micronmesh NOC with Micron Message-Passing protocol","authors":"H. Kariniemi, J. Nurmi","doi":"10.1109/SOCC.2009.5335685","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335685","url":null,"abstract":"In the future Multi-Processor System-on-Chip (MPSoC) platforms are becoming more vulnerable to transient and intermittent faults due to physical level problems of VLSI technologies. This sets new requirements to the fault-tolerance of the messaging layer software which applications use for communication, because the faults make the operation of the Network-on-Chip (NoC) hardware of the MPSoCs less reliable. This paper presents Micron Message-Passing (MMP) Protocol which is a light-weight protocol designed for improving the fault tolerance of the messaging layer of the MPSoCs where Micronmesh NoC is used. Its fault-tolerance is implemented by watchdog timers and Cyclic Redundancy Checks (CRC) which are usable for detecting packet losses, communication deadlocks, and bit errors. These three functionalities are necessary, because without them the software executed on the MPSoCs is not able to detect the faults and recover from them. This paper presents also how the MMP Protocol can be used for implementing applications which are able to recover from communication faults.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129508470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scheduling framework for real-time dependable NoC-based systems","authors":"Mihkel Tagel, P. Ellervee, G. Jervan","doi":"10.1109/SOCC.2009.5335670","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335670","url":null,"abstract":"Technology scaling into subnanometer range will create process variations which have impact on the overall manufacturing yield and quality. At the same time System-on-Chip (SoC) complexity and communication requirements are increasing which will make a SoC designer goal to design a fault-free system a very difficult task. The dependability will be an important measure of System-on-Chip design process. As a result we see a shift from bus based systems into networked systems and from traditional Register Transfer Level (RTL) design paradigm into higher abstraction levels — High Level Synthesis (HLS) and system-level design. In real-time networked systems the dependability cannot be reached effectively without predictable contention free communication synthesis. In this paper, an approach that takes into account flow control unit(s) transmission latencies over actual links, is extended to cover, in addition to virtual cut-through, also wormhole switching and wormhole switching with virtual channels. The communication synthesis results are used in our proposed system-level design methodology for dependable realtime Systems-on-Chip.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124394374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient software cache for H.264 motion compensation","authors":"A. Azevedo, B. Juurlink","doi":"10.1109/SOCC.2009.5335657","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335657","url":null,"abstract":"This paper presents an efficient software cache implementation for H.264 Motion Compensation on scratchpad memory based systems. For a wide range of applications — especially multimedia applications, the data set is predictable, making it possible to transfer the necessary data before the computation. Some kernels, however, depend on data that are known just before they are needed, such as the H.264 Motion Compensation (MC). MC has to stall while the data is transfered from the main memory. To overcome this problem and increase the performance, we analyze the data locality for the MC. Based on this analysis, we propose a 2D Software Cache (2DSC) implementation. The 2DSC exploits the application characteristics to reduce overheads, providing in average 65% improvement over the hand programmed DMAs.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115489269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Hassan, B. Cheng, W. Vanderbauwhede, F. Salazar
{"title":"Impact of device variability in the communication structures for future synchronous SoC designs","authors":"F. Hassan, B. Cheng, W. Vanderbauwhede, F. Salazar","doi":"10.1109/SOCC.2009.5335676","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335676","url":null,"abstract":"In this paper we undertake a first step towards the study of the impact random dopant fluctuation (RDF) in the devices will have on on-chip synchronous communication structures, such as line drivers, repeaters and latches. The study is based on Monte Carlo simulation of the circuits at the 25, 18 and 13 nm technology generations using predictive device models. It has been found that variability has a significant impact on the performance of communication structures designed using small devices. Therefore, as a design methodology, it is proposed to use larger sized devices in critical parts of the circuits at the cost of larger area and power. Surprisingly, this work also points out that tapered buffers with larger tapering factor are more prone to delay variability, which might lead into reconsidering the optimal sizing of these structures. It may very well be possible to tackle such variabilities with active approaches, which are beyond the scope of this text.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125485526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Rossi, F. Campi, A. Deledda, C. Mucci, Stefano Pucillo, Sean Whitty, R. Ernst, S. Chevobbe, Stéphane Guyetant, M. Kühnle, M. Hübner, J. Becker, W. Putzke-Röming
{"title":"A multi-core signal processor for heterogeneous reconfigurable computing","authors":"D. Rossi, F. Campi, A. Deledda, C. Mucci, Stefano Pucillo, Sean Whitty, R. Ernst, S. Chevobbe, Stéphane Guyetant, M. Kühnle, M. Hübner, J. Becker, W. Putzke-Röming","doi":"10.1109/SOCC.2009.5335668","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335668","url":null,"abstract":"Reconfigurable computing holds the promise of delivering ASIC-like performance while preserving run-time flexibility of processors. In many application domains, the use of FPGAs is limited by area, power, and timing overheads. Coarse-grained reconfigurable architectures offer higher computation density, but at the price of rather being domain specific. Programmability is also a major issue related to all of the described solutions. This paper describes a heterogeneous multi-core system-on-chip that exploits different flavours of reconfigurable computing, merged together in a high parallel on-chip and off-chip interconnect utilized for both data and configuration. The aim of this work is to deliver a single monolithic engine that capitalizes on the strong points of different reconfigurable fabrics, while providing a friendly programming interface. The user is ultimately able to manage a broad spectrum of different applications, exploiting the most efficient means of computation through utilization of each kernel, while retaining a software-oriented development environment as much as possible.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126711060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}