{"title":"A formal specification of fault-tolerance in prospecting asteroid mission with Reactive Autonomie Systems Framework","authors":"Heng Kuang, O. Ormandjieva, S. Klasa, J. Bentahar","doi":"10.1109/ASAP.2010.5540769","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540769","url":null,"abstract":"The NASA's Autonomous Nano Technology Swarm (ANTS) is a generic mission architecture consisting of miniaturized, autonomous, self-similar, reconfigurable, and addressable components forming structures. The Prospecting Asteroid Mission (PAM) is one of ANTS applications for survey of large dynamic populations. In this paper, we propose a formal approach based on Category Theory to specify the fault-tolerance property in PAM by Reactive Autonomie Systems Framework.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123612733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao Peng, Zhixiang Chen, Xiongxin Zhao, F. Maehara, S. Goto
{"title":"High parallel variation Banyan network based permutation network for reconfigurable LDPC decoder","authors":"Xiao Peng, Zhixiang Chen, Xiongxin Zhao, F. Maehara, S. Goto","doi":"10.1109/ASAP.2010.5540964","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540964","url":null,"abstract":"Permutation network plays an important role in the reconfigurable QC-LDPC decoder for most modern wireless communication systems with multiple code rates and various code lengths. In this paper, we propose the variation Banyan network (VBN) based permutation network architecture for the reconfigurable QC-LDPC decoders and give the control signal generating algorithm for cyclic shift. Through introducing the bypass network, we put forward the nonblocking scheme for any input number and shift number. In addition, the optimized VBN is proposed for WiMAX and WiFi standard, which can shift at most 4 groups of input data, and greatly reduce the hardware complexity. The synthesis results using the 90nm technology demonstrate that the proposed permutation network can be implemented with the gate count of 18.3k and the frequency of 600 MHz.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125176905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Elliptic Curve point multiplication on GPUs","authors":"S. Antão, J. Bajard, L. Sousa","doi":"10.1109/ASAP.2010.5541000","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5541000","url":null,"abstract":"Acceleration of cryptographic applications on Graphical Processing Units (GPUs) platforms is a research topic with practical interest, because these platforms provide huge computational power for this type of applications. In this paper, we propose a parallel algorithm for Elliptic Curve (EC) point multiplication in order to compute EC cryptography on GPUs. The proposed approach relies in using the Residue Number System (RNS) to extract parallelism on high precision integer arithmetic. Results suggest a maximum throughput of 9990 EC multiplications per second and minimum latency of 24.3 ms for a 224-bit underlying field, for an Nvidia 285 GTX GPU. We present performances up to an order of magnitude better in latency and 122 % in throughput regarding other approaches reported in the related art.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122870822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Newton-Raphson algorithms for floating-point division using an FMA","authors":"N. Louvet, J. Muller, A. Panhaleux","doi":"10.1109/ASAP.2010.5540948","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540948","url":null,"abstract":"Since the introduction of the Fused Multiply and Add (FMA) in the IEEE-754-2008 standard [6] for floatingpoint arithmetic, division based on Newton-Raphson's iterations becomes a viable alternative to SRT-based divisions. The Newton-Raphson iterations were already used in some architecture prior to the revision of the IEEE-754 norm. For example, Itanium architecture already used this kind of iterations [8]. Unfortunately, the proofs of the correctness of binary algorithms do not extend to the case of decimal floating-point arithmetic. In this paper, we present general methods to prove the correct rounding of division algorithms using Newton-Raphson's iterations in software, for radix 2 and radix 10 floating-point arithmetic.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129136498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of binary edwards curves for very-constrained devices","authors":"Ünal Koçabas, Junfeng Fan, I. Verbauwhede","doi":"10.1109/ASAP.2010.5541003","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5541003","url":null,"abstract":"Elliptic Curve Cryptography (ECC) is considered as the best candidate for Public-Key Cryptosystems (PKC) for ubiquitous security. Recently, Elliptic Curve Cryptography (ECC) based on Binary Edwards Curves (BEC) has been proposed and it shows several interesting properties, e.g., completeness and security against certain exceptional-points attacks. In this paper, we propose a hardware implementation of the BEC for extremely constrained devices. The w-coordinates and Montgomery powering ladder are used. Next, we also give techniques to reduce the register file size, which is the largest component of the embedded core. Thirdly, we apply gated clocking to reduce the overall power consumption. The implementation has a size of 13,427 Gate Equivalent (GE), and 149.5 ms are required for one point multiplication. To the best of our knowledge, this is the first hardware implementation of binary Edwards curves.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117252882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA-based lossless compressors of floating-point data streams to enhance memory bandwidth","authors":"Kazuya Katahira, K. Sano, S. Yamamoto","doi":"10.1109/ASAP.2010.5540973","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540973","url":null,"abstract":"This paper presents an FPGA-based lossless compressor which directly compresses floating-point data streams to enhance the actual memory bandwidth of lattice Boltzmann method (LBM) accelerators. We show that the compression algorithms based on the 1D polynomial prediction are suitable for high-throughput hardware design. Moreover we show that integer operations provide comparable prediction performance to a floating-point predictor, while an integer predictor is expected to have smaller circuits than a floating-point one. We evaluate the compression ratio, the operating frequency and the resource consumption of the compressors with integer-based predictors through their prototype implementation using ALTERA Stratix III FPGA. We demonstrate that the implemented compressors dominate only 0.15 to 0.23 % of the entire logic resources and operate at 95 to 174 MHz to provide the compression ratio of up to 3.5, which means that we can enhance the memory bandwidth by a factor of 3.5 on average.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127210713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaoshan Liu, Richard Neil Pittman, Alessandro Form, J. Gaudiot
{"title":"On energy efficiency of reconfigurable systems with run-time partial reconfiguration","authors":"Shaoshan Liu, Richard Neil Pittman, Alessandro Form, J. Gaudiot","doi":"10.1109/ASAP.2010.5540985","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540985","url":null,"abstract":"In this paper we study whether partial reconfiguration can be used to reduce FPGA energy consumption. In an ideal scenario, we will have a hardware accelerator to assist with certain parts of program execution. When the accelerator is not active, we use partial reconfiguration to unload it to reduce both static and dynamic power. However, the reconfiguration process may introduce a high energy overhead, thus it is unclear whether this approach is feasible. To approach this problem, we identify the conditions under which partial reconfiguration can be used to reduce energy consumption, and we propose solutions to minimize the configuration energy overhead. The results of our study show that by using partial reconfiguration to reduce the power consumption of the accelerator when it is inactive, we can accelerate program execution and at the same time halve the overall energy consumption.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126396421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Customizing controller instruction sets for application-specific architectures","authors":"Jian Li, David Dickin, Lesley Shannon","doi":"10.1109/ASAP.2010.5540965","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540965","url":null,"abstract":"Previous work has proposed the \"Systems Integrating Modules with Predefined Physical Links\" (SIMPPL) architectural framework as one possible method to shorten the design cycle by utilizing a light weight programmable controller (SIMPPL Controller) as the system-level interface. This paper presents a study of how much improvement in area, power, and performance can be achieved through the customization of the SIMPPL Controller's instruction set. Furthermore, we have created a tool to automatically generate the HDL for SIMPPL Controllers with a user specified instruction set. Our study on an FPGA platform has shown that using a customized SIMPPL Controller with a minimal instruction set results in: an area reduction of 42%, a performance increase of 16%, and a power reduction of 10%.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121139423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An FPGA-specific algorithm for direct generation of multi-variate Gaussian random numbers","authors":"David B. Thomas, W. Luk","doi":"10.1109/ASAP.2010.5541005","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5541005","url":null,"abstract":"The multi-variate Gaussian distribution is used to model random processes with distinct pair-wise correlations, such as stock prices that tend to rise and fall together. Multi-variate Gaussian vectors with length n are usually produced by first generating a vector of n independent Gaussian samples, then multiplying with a correlation inducing matrix requiring 0(n2) multiplications. This paper presents a method of generating vectors directly from the uniform distribution, removing the need for an expensive scalar Gaussian generator, and eliminating the need for any multipliers. The method relies only on small ROMs and adders, and so can be implemented using just logic resources (LUTs and FFs), saving DSP and block-RAM resources for the numerical simulation that the multi-variate generator is driving. The new method provides a ten times increase in raw performance over the fastest existing FPGA generation method, and also provides a five times improvement in performance per resource over the most efficient existing method. Using this method a single 400MHz Virtex-5 FPGA can generate vectors ten times faster than an optimised CUDA implementation on a 1.2GHz GPU, and a hundred times faster than SIMD optimised software on a quad core 2.2GHz CPU.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115083340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Completeness of automatically generated instruction selectors","authors":"F. Brandner","doi":"10.1109/ASAP.2010.5540994","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540994","url":null,"abstract":"The use of tree pattern matching for instruction selection has proven very successful in modern compilers. This can be attributed to the declarative nature of tree grammar specifications, which greatly simplifies the development of fast high-quality code generators. The approach has also been adopted widely by generator tools that aim to automatically extract the instruction selector, as well as other compiler components, for application-specific instruction processors from generic processor models. A major advantage of tree pattern matching is that it is suitable for static analysis and allows to verify properties of a given specification. Completeness is an important example of such a property, in particular for automatically generated compilers. Tree automata can be used to prove that a given instruction selector specification is complete, i.e., can actually generate machine code for all possible input programs. Traditional approaches for completeness tests cannot represent dynamic checks that may disable certain matching rules during code generation. However, these dynamic checks occur very frequently in compilers targeting application-specific processors. The dynamic checks arise from hidden properties that are not captured by the terminal symbols of the tree grammar notation. We apply terminal splitting to the instruction selector specifications that are automatically derived from structural processor models to make these properties explicit. The transformed specification is then verified using a traditional completeness test. If the test fails, counter examples are presented that allow to adopt the compiler or extend the processor model accordingly.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129864461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}