{"title":"Two-Dimensional Dynamic Multigrained Reconfigurable Hardware","authors":"L. Braun, J. Becker","doi":"10.1109/ISVLSI.2010.9","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.9","url":null,"abstract":"Partial dynamic reconfigurable (PDR) systems designed with state-of-the-art tool chains, like the Early Access Partial Reconfiguration (EAPR) Flow from Xilinx cite{UG208}, don't exploit the flexibility provided by dynamic an partial reconfiguration features a state of the art FPGA chip offers. For example the utilized chip area and the position for a dynamic area on the chip is traditionally fixed during design-time. Thereby the shape and the size of the area is given by the largest module. If a smaller module is placed on the region of a bigger one, chip area stays unused. These mentioned restrictions are only some examples for the current support of development and run-time tools for reconfigurable hardware architectures. A new approach is shown for exploiting the capability of reconfigurable hardware architectures more efficient than other solutions introduced before. This is achieved through a novel concept of using micro blocks for the communication infrastructure as well as for the functional elements on the FPGA. The granularity of the micro blocks for building up more complex structures on the FPGA is discussed in this paper.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127524470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BLAKE HASH Function Family on FPGA: From the Fastest to the Smallest","authors":"N. Sklavos, P. Kitsos","doi":"10.1109/ISVLSI.2010.115","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.115","url":null,"abstract":"Hash functions form an important category of cryptography, which is widely used in a great number of protocols and security mechanisms. SHA-2 is the up to date NIST standard, but is going to be substituted in the near future with a new, modern one. NIST has selected the Second Round Candidates of the SHA-3 Competition. A year is allocated for the public review of these algorithms, and the Second SHA-3 Candidate Conference is being planned for August 23-24, 2010, after Crypto 2010. This paper deals with FPGA implementations of BLAKE hash functions family, which is one of the finalists. In this work, a VLSI architecture for the BLAKE family is proposed. For every hash function of BLAKE (-28, -32, -48, & -64), a hardware implementation is presented. The introduced integrations are examined and compared with hardware implementation terms. Computational efficiency of SHA-3 finalists in silicon, is one of the evaluation criteria of SHA-3.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"23 Suppl 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122477606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fine-Grained Fault Tolerance for Process Variation-Aware Caches","authors":"Tayyeb Mahmood, Soontae Kim","doi":"10.1109/ISVLSI.2010.57","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.57","url":null,"abstract":"Continuous scaling in CMOS fabrication process makes circuits more vulnerable to process variations, which results in variable delay, malfunctioning, and/or leaky circuits. Caches are one of the biggest victims of process variations due to their large sizes and minimal cell features. To mitigate the impacts of process variations on caches, we propose to localize the effects of process variations at a word level, not at the conventional cache set, cache way, or cache line level. Faulty words are disabled or shut down completely and accesses to those words are bypassed to a small set of word-length buffers. This technique is shown to be effective in reducing performance penalty due to process variations and in increasing the parametric yield up to 90% when subjected to the performance constraints.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117299041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Freeze Linear Decompressors for Low Power Testing","authors":"V. Tenentes, X. Kavousianos","doi":"10.1109/ISVLSI.2010.37","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.37","url":null,"abstract":"Even though linear decompressors constitute a very effective solution for compressing test data, they cause increased shift power dissipation during scan testing. Recently, a new linear decompression architecture was proposed which offers reduced shift power at the expense however of increased test data volume and test sequence length. In this paper we present a new linear encoding method which offers both high compression and low shift power dissipation at the same time. A new low-cost, test-set-independent scheme is also proposed which can be combined with any linear decompressor for reducing the shift power during testing. Extensive experiments show that the proposed method offers reduced test power dissipation, test sequence length and test data volume at the same time, with very small area requirements.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128742063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Challenges and Perspectives of Computer Architecture at the Nano Scale","authors":"C. Gamrat","doi":"10.1109/ISVLSI.2010.118","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.118","url":null,"abstract":"Advances in nanotechnology and the research of new materials has led to the elaboration of nano-components with novel properties and functions. Exploring how those novel components could be used to devise future computer architectures, complementing rather than supplementing CMOS technology, is a new research subject known as Nanocomputing. In this talk we will present the major challenges of such a research, the potential benefits of using technologies other than CMOS and some perspectives in terms of design and applications. We will illustrate the topic by presenting some of the most advanced results in the field, focusing on results from the European project FP7, NABAB, that explores how to build neuro-inspired computing structures with a variety of nanotechnologies.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130404649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camille Jalier, D. Lattard, G. Sassatelli, P. Benoit, L. Torres
{"title":"A Homogeneous MPSoC with Dynamic Task Mapping for Software Defined Radio","authors":"Camille Jalier, D. Lattard, G. Sassatelli, P. Benoit, L. Torres","doi":"10.1109/ISVLSI.2010.110","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.110","url":null,"abstract":"In this paper, we present a flexible and distributed homogeneous Software Defined Radio (SDR) platform. This platform is an array of processing elements, called Smart ModEm Processors (SMEP), interconnected by a Network-on-Chip. Implemented in ST65nm, each processing element performs 3.2 GMAC/s with 77 GBits/s internal bandwidth while dissipating 110mW. Each SMEP unit contains a MIPS processor for task management including dynamic mapping. This distributed management solves the MPSoC scalability and programmability problem, and improves resource allocation and energy efficiency. This homogeneous approach with high management flexibility is new in the domain of mobile terminals. With real-time constraints, the challenge is to enables flexibility with a reduced overhead in terms of performance, power consumption and silicon area.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129176034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA-Based Runtime Adaptive Multiprocessor Approach for Embedded High Performance Computing Applications","authors":"D. Göhringer, J. Becker","doi":"10.1109/ISVLSI.2010.30","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.30","url":null,"abstract":"Embedded high performance computing applications, like for example image processing in surveillance systems, are very compute intensive due to the complexity of the algorithms. Additionally to the computing intensive data processing, the power consumption for such systems needs to be minimized in order to keep them lightweight and mobile operational. One solution for achieving these goals is to exploit hardware parallelism for acceleration purposes on reconfigurable hardware, like Field Programmable Gate Arrays (FPGA). Due to the increase of performance, the clock speed can be reduced, which leads to a reduced power consumption in comparison to traditional processor-based approaches. A challenging task until today is the programming of these devices e.g. with standardized tools or languages like e.g. C. There exist C-to-FPGA tools that ease the programming of these systems, but they do not handle the communication with the environment, e.g. camera interfaces, PCI-interfaces, etc. This still has to be designed in time consuming and handcrafted work. Also the aforementioned tools still have some restriction on the input language. The novel approach in the presented work is to combine processors in a multiprocessor architecture on FPGA for high performance computing applications. This solution combines the flexibility of FPGAs and the high-level programming paradigms of multiprocessor systems and can be seen as a meet-in-the middle solution. This holistic approach is called RAMPSoC (Runtime Adaptive MPSoC) and combines a novel hardware architecture, consisting of heterogeneous processing elements connected over a novel heterogeneous Network-on-Chip, with a user-guided design methodology and a new runtime resource management system.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"458 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124338884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christos Baloukas, Lazaros Papadopoulos, D. Soudris, S. Stuijk, Olivera Jovanovic, F. Schmoll, D. Cordes, R. Pyka, A. Mallik, S. Mamagkakis, F. Capman, S. Collet, N. Mitas, D. Kritharidis
{"title":"Mapping Embedded Applications on MPSoCs: The MNEMEE Approach","authors":"Christos Baloukas, Lazaros Papadopoulos, D. Soudris, S. Stuijk, Olivera Jovanovic, F. Schmoll, D. Cordes, R. Pyka, A. Mallik, S. Mamagkakis, F. Capman, S. Collet, N. Mitas, D. Kritharidis","doi":"10.1109/ISVLSI.2010.96","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.96","url":null,"abstract":"As embedded systems are becoming the center of our digital life, system design becomes progressively harder. The integration of multiple features on devices with limited resources requires careful and exhaustive exploration of the design search space in order to efficiently map modern applications to an embedded multi-processor platform. The MNEMEE project addresses this challenge by offering a unique integrated tool flow that performs source-to-source transformations to automatically optimize the original source code and map it on the target platform. The optimizations aim at reducing the number of memory accesses and the required memory storage of both dynamically and statically allocated data. Furthermore, the MNEMEE tool flow performs optimal assignment of all data on the memory hierarchy of the target platform. Designers can use the whole flow or a part of it and integrate it into their own design flow. This paper gives an overview of the MNEMEE tool flow along. It also presents two industrial case studies that demonstrate who the techniques and tools developed in the MNEMEE project can be integrated into industrial design flows.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128196110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Biswas, Pramod Udupa, Rajdeep Mondal, Keshavan Varadarajan, M. Alle, S. Nandy, R. Narayan
{"title":"Accelerating Numerical Linear Algebra Kernels on a Scalable Run Time Reconfigurable Platform","authors":"P. Biswas, Pramod Udupa, Rajdeep Mondal, Keshavan Varadarajan, M. Alle, S. Nandy, R. Narayan","doi":"10.1109/ISVLSI.2010.65","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.65","url":null,"abstract":"Numerical Linear Algebra (NLA) kernels are at the heart of all computational problems. These kernels require hardware acceleration for increased throughput. NLA Solvers for dense and sparse matrices differ in the way the matrices are stored and operated upon although they exhibit similar computational properties. While ASIC solutions for NLA Solvers can deliver high performance, they are not scalable, and hence are not commercially viable. In this paper, we show how NLA kernels can be accelerated on REDEFINE, a scalable runtime reconfigurable hardware platform. Compared to a software implementation, Direct Solver (Modified Faddeev's algorithm) on REDEFINE shows a 29X improvement on an average and Iterative Solver (Conjugate Gradient algorithm) shows a 15-20% improvement. We further show that solution on REDEFINE is scalable over larger problem sizes without any notable degradation in performance.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134590677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Alexandropoulos, E. Davrazos, F. Plessas, M. Birbas
{"title":"A Novel 1.8 V, 1066 Mbps, DDR2, DFI-Compatible, Memory Interface","authors":"A. Alexandropoulos, E. Davrazos, F. Plessas, M. Birbas","doi":"10.1109/ISVLSI.2010.49","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.49","url":null,"abstract":"An innovative design of a 533 MHz DDR2 SDRAM PHY based on a common standard bus interface (DFI) and implemented in 90 nm standard CMOS process, is presented in this paper. Off-chip driver with calibrated strength, slew rate control, and on-die termination mechanism are utilized to provide improved signal integrity. Furthermore a DDR3-like I/O architecture and an appropriate calibration mechanism has been employed in order to reduce input capacitance. A Register-Controlled Delay Locked Loop (RCDLL) is included that measures the period of the external DFI clock to generate two stable clock phases (0deg, 90deg) and aligns it with the internal PHY clock. A novel Dynamic Strobe Masking System (DSMS) has also been employed which, in contrast to traditional techniques, dynamically adjusts the length of the masking signal in real-time, based on the incoming strobe. Finally, the PHY provides the necessary hooks for data capture training by an external calibration engine. Post layout simulation results demonstrate its robustness over process, voltage, and temperature variations.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130726163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}