Rekha K. James, T. Shahana, K. Poulose Jacob, S. Sasi
{"title":"A New Look at Reversible Logic Implementation of Decimal Adder","authors":"Rekha K. James, T. Shahana, K. Poulose Jacob, S. Sasi","doi":"10.1109/ISSOC.2007.4427442","DOIUrl":"https://doi.org/10.1109/ISSOC.2007.4427442","url":null,"abstract":"Reversibility plays a fundamental role when computations with minimal energy dissipation are considered. In recent years, reversible logic has emerged as one of the most important approaches for power optimization with its application in low power CMOS, quantum computing and nanotechnology. This research proposes a new implementation of Binary Coded Decimal (BCD) adder in reversible logic. The design reduces the number of gates and garbage outputs compared to the existing BCD adder reversible logic implementations. So, this design gives rise to an implementation with a reduced area and delay.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124380864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A System-level Design Method for Cognitive Radio on a Reconfigurable Multi-processor Architecture","authors":"Qiwei Zhang, A. Kokkeler, G. Smit","doi":"10.1109/ISSOC.2007.4427419","DOIUrl":"https://doi.org/10.1109/ISSOC.2007.4427419","url":null,"abstract":"The future trend of software defined radio (SDR) platforms moves toward reconfigurable Multiprocessor System-on-Chips (MPSoCs). However, there is a gap between the modelling of the dynamic radio applications and the optimized implementation of the application on reconfigurable multiprocessor architectures. We aim to close this gap by applying a system level design method for the modelling and implementation of the applications on an MPSoC. The state-of-the-art radio technology based on SDR, Cognitive Radio, is considered as a design case to demonstrate the effectiveness of this method.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133846277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing the conjugate gradient algorithm on multi-core systems","authors":"Wouter Wiggers, V. Bakker, A. Kokkeler, G. Smit","doi":"10.1109/ISSOC.2007.4427436","DOIUrl":"https://doi.org/10.1109/ISSOC.2007.4427436","url":null,"abstract":"In linear solvers, like the conjugate gradient algorithm, sparse-matrix vector multiplication is an important kernel. Due to the sparseness of the matrices, the solver runs relatively slow. For digital optical tomography (DOT), a large set of linear equations have to be solved which currently takes in the order of hours on desktop computers. Our goal was to speed up the conjugate gradient solver. In this paper we present the results of applying multiple optimization techniques and exploiting multi-core solutions offered by two recently introduced architectures: Intel's Woodcrest general purpose processor and NVIDIA's G80 graphical processing unit. Using these techniques for these architectures, a speedup of a factor three has been achieved.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128477075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Varatkar, S. Narayanan, Naresh R Shanbhag, Douglas L. Jones
{"title":"Sensor Network-On-Chip","authors":"G. Varatkar, S. Narayanan, Naresh R Shanbhag, Douglas L. Jones","doi":"10.1109/ISSOC.2007.4427447","DOIUrl":"https://doi.org/10.1109/ISSOC.2007.4427447","url":null,"abstract":"In this paper, we present the sensor network-on-a-chip (SNOC) paradigm for designing robust and energy-efficient systems-on-a-chip (SOC). In this paradigm, computation in the presence of nanometer non-idealities such as process variations, leakage and noise is viewed as an estimation problem. Robust statistical signal processing theory is then employed to recover the performance of the system in the presence of errors especially timing errors. We apply this framework to design an energy-efficient and robust PN-code acquisition system for the wireless CDMA2000 standard. Simulations in IBM's 130 nm CMOS process technology demonstrate up to 30% power savings compared to the conventional architecture for a detection probability of PD = 0.5.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127313606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Run-Time Scheduled Hardware Acceleration of MPEG-4 Video Decoding","authors":"J. Boutellier, P. Jääskeläinen, O. Silvén","doi":"10.1109/ISSOC.2007.4427425","DOIUrl":"https://doi.org/10.1109/ISSOC.2007.4427425","url":null,"abstract":"In this paper we present a hardware-accelerated system-on-chip implementation of an MPEG-4 simple profile video decoder with a novel hardware accelerator interfacing methodology. The system consists of a general purpose master processor and several slave hardware accelerators. The communication between the master processor and the hardware accelerators is performed without interrupts by using piecewise-static run-time scheduling. After the data content of each macroblock has been discovered, the master processor computes a short static schedule for the accelerators. This removes the need for the accelerators to interrupt the master processor when the assigned task is finished. Therefore, context save overheads in the master processor are avoided and energy efficiency improves. The accelerators execute functions that perform block-level decoding operations (IDC, inverse quantization etc.), which have deterministic execution times and can be scheduled statically. The task scheduling algorithm executed by the master processor is able to take into account the costs and restrictions of a shared memory with limited access capabilities and marks memory accesses separately to the schedule. The possible heterogeneity of the processing units is also taken care of. Tests show that the proposed scheme is feasible and can be used as an alternative to traditional synchronization methods.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128444773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Feature-Based Optical Flow Processor Architecture Featuring Single-Motion-Vector/Cycle Generation","authors":"Kazuhide Fujita, Kiyoto Ito, T. Shibata","doi":"10.1109/ISSOC.2007.4427444","DOIUrl":"https://doi.org/10.1109/ISSOC.2007.4427444","url":null,"abstract":"A feature-based optical flow processor architecture has been developed. It has a special data allocation scheme in on-chip SRAM banks and a parallel shift and matching unit using compact absolute difference circuits. As a result, single-motion-vector/cyclc generation at arbitrary locations in the scene has been achieved. The core circuitries were designed in a 0.18-m 5-metal CMOS technology and sent to fabrication, and the operation was confirmed by Nanosim simulation. Although the simulation results arc limited to only the core circuitries, it is expected that the chip can generate optical flow about 10,000 times faster than software processing.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124655397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The NoCRay Graphic Accelerator: a Case-study for MP-SoC Network-on-Chip Design Methodology","authors":"S. Tota, M. Casu, P. Ros, M. R. Roch, M. Zamboni","doi":"10.1109/ISSOC.2007.4427429","DOIUrl":"https://doi.org/10.1109/ISSOC.2007.4427429","url":null,"abstract":"The many-core design paradigm requires llexible and modular hardware and software components to provide the required scalability of next-generation on-chip multiprocessor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this work a complete design methodology is proposed, tackling at once the aspects of hardware architecture, programming model and design automation. The proposed design flow has been used in the implementation of a multiprocessor Network-on-Chip based system, the NoCRay graphic accelerator. The system uses 8 Tensilica LX processors and has been physically implemented on a Xilinx Virtex-4 LX-160 FPGA reporting a 17.3M equivalent gate-count. Performance are compared with a commercial general purpose processor and show good results considering the low frequency of the prototype.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129921340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Configuration Locking Technique to Reduce the Configuration Overhead of Run-Time Reconfigurable Devices","authors":"Yang Qu, J. Soininen, J. Nurmi","doi":"10.1109/ISSOC.2007.4427423","DOIUrl":"https://doi.org/10.1109/ISSOC.2007.4427423","url":null,"abstract":"Run-time reconfigurable logic is an interesting design alterative in SoC design because it can simultaneously provide high performance and flexibility. However, its configuration overhead can largely decrease the system performance. In this work, we present a novel configuration locking technique to reduce the overhead. The idea is to lock at run-time a number of frequently used tasks on the configuration memory so that they cannot be evicted by other tasks. A number of real applications were used to validate the approach. The results show that using proper amount of resources to lock frequently used tasks can significantly improve the performance.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130493997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kris Heyrman, A. Papanikolaou, F. Catthoor, P. Veelaert, W. Philips
{"title":"Using a Linear Sectioned Bus And a Communication Processor to Reduce Energy Costs in Synchronous On-Chip Communication","authors":"Kris Heyrman, A. Papanikolaou, F. Catthoor, P. Veelaert, W. Philips","doi":"10.1109/ISSOC.2007.4427432","DOIUrl":"https://doi.org/10.1109/ISSOC.2007.4427432","url":null,"abstract":"The sectioned bus is an energy-optimal architecture for system-on-chip (SoC) communication, where we save energy by consequently switching off unused bus sections on a cycle-by-cycle basis. The communication processor is a paradigm for the control of such a bus by means of software. Synchronous communication takes place within the tiles of a SoC in the deep sub-micron technology domain. We explore design alternatives for a linear software-controlled sectioned bus while building the hardware model of a single-instruction-issue processor, and run, in simulation, a media benchmark on it. We determine the energy cost of controlling this bus, compare it with the energy gain obtained from the sectioning, and find it favorable. The control cost is only 5% of the bus transport energy, leaving us with a gain by segmentation of 81%. We demonstrate the feasibility of the control of a low-power synchronous communication system by the processor. Starting out from this case study at the low-end to medium range of network complexity, we consider the implications of growing complexity that will arise from using multiple sectioned buses on multiple-issue computers (VLIWs). We find that control of linear bus topologies of medium-level complexity is now well understood. Further work is needed at the high-end of non-linear topologies.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"18 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131094612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reduce SOC Energy Consumption through Processor ISA Extension","authors":"S. Leibson","doi":"10.1109/ISSOC.2007.4427427","DOIUrl":"https://doi.org/10.1109/ISSOC.2007.4427427","url":null,"abstract":"The combination of reduced core operating voltage and reduced clock frequency achieved through processor core ISA extension greatly reduces the energy required to execute the task, often by one to two orders of magnitude.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127198789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}