ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors最新文献

筛选
英文 中文
Code generation for hardware accelerated AES 硬件加速AES的代码生成
Raymond Manley, Paul Magrath, David Gregg
{"title":"Code generation for hardware accelerated AES","authors":"Raymond Manley, Paul Magrath, David Gregg","doi":"10.1109/ASAP.2010.5540955","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540955","url":null,"abstract":"Data must be encrypted if it is to remain confidential when sent over computer networks. Encryption solves many problems involving invasion of privacy, identity theft, fraud, and data theft. However for encryption to be widely used, it must be fast. The problem is so important that new Intel processors provide hardware support for encryption. These instructions implement key stages of the Advanced Encryption Standard (AES), allowing encryption to be completed more quickly and using less power. The AES algorithm consists of several 'rounds' of encryption, each of which involves a relatively complicated computation. This new hardware support allows an entire round to be implemented with just a single instruction. An implementation of the AES algorithm using these instructions contains several code sections that can be fine tuned for optimal performance. However, these optimizations are usually done by hand, which can be a lengthy, labour intensive process. We present a system that can generate billions of variants of the AES encryption code to find the best solution for a particular microarchitecture. We apply both common loop optimizations and ones specific to AES. We evaluate the generated code on hardware with built-in AES support using both selective-brute force and guided searches. Our generator achieves significant speedups over a straightforward implementation of the code.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130022198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Implementing decimal floating-point arithmetic through binary: Some suggestions 通过二进制实现十进制浮点运算:一些建议
N. Brisebarre, N. Louvet, Érik Martin-Dorel, J. Muller, A. Panhaleux, M. Ercegovac
{"title":"Implementing decimal floating-point arithmetic through binary: Some suggestions","authors":"N. Brisebarre, N. Louvet, Érik Martin-Dorel, J. Muller, A. Panhaleux, M. Ercegovac","doi":"10.1109/ASAP.2010.5540969","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540969","url":null,"abstract":"We propose algorithms and provide some related results that make it possible to implement decimal floatingpoint arithmetic on a processor that does not have decimal operators, using the available binary floating-point functions. In this preliminary study, we focus on round-to-nearest mode only. We show that several functions in decimal32 and dec-imal64 arithmetic can be implemented using binary64 and binaryl28 floating-point arithmetic, respectively. We discuss the decimal square root and some transcendental functions. We also consider radix conversion algorithms.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133671565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Design of throughput-optimized arrays from recurrence abstractions 从递归抽象中设计吞吐量优化数组
A. Jacob, J. Buhler, R. Chamberlain
{"title":"Design of throughput-optimized arrays from recurrence abstractions","authors":"A. Jacob, J. Buhler, R. Chamberlain","doi":"10.1109/ASAP.2010.5540753","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540753","url":null,"abstract":"Many compute-bound applications have seen order-of-magnitude speedups using special-purpose accelerators. FPGAs in particular are good at implementing recurrence equations realized as arrays. Existing high-level synthesis approaches for recurrence equations produce an array that is latency-space optimal. We target applications that operate on a large collection of small inputs, e.g. a database of biological sequences, where overall throughput is the most important measure of performance. In this work, we introduce a new design-space exploration procedure within the polyhedral framework to optimize throughput of a systolic array subject to area and bandwidth constraints of an FPGA device. Our approach is to exploit additional parallelism by pipelining multiple inputs on an array and multiple iteration vectors in a processing element. We prove that the throughput of an array is given by the inverse of the maximum number of iteration vectors executed by any processor in the array, which is determined solely by the array's projection vector. We have applied this observation to discover novel arrays for Nussinov RNA folding. Our throughput-optimized array is 2× faster than the standard latency-space optimal array, yet it uses 15% fewer LUT resources. We achieve a further 2× speedup by processor pipelining, with only a 37% increase in resources. Our tool suggests additional arrays that trade area for throughput and are 4–5× faster than the currently used latency-optimized array. These novel arrays are 70–172× faster than a software baseline.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128503649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A fully-overlapped multi-mode QC-LDPC decoder architecture for mobile WiMAX applications 用于移动WiMAX应用的全重叠多模QC-LDPC解码器架构
Bo Xiang, Dan Bao, Shuangqu Huang, Xiaoyang Zeng
{"title":"A fully-overlapped multi-mode QC-LDPC decoder architecture for mobile WiMAX applications","authors":"Bo Xiang, Dan Bao, Shuangqu Huang, Xiaoyang Zeng","doi":"10.1109/ASAP.2010.5540958","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540958","url":null,"abstract":"A fully-overlapped multi-mode QC-LDPC decoder architecture, adopting improved TDMP algorithm, is presented in this paper. With symmetrical four-stage pipelining, block column and row permutations, nonzero sub-matrix reordering, sum memory odd-even partition, and read-write bypass, two phases are fully overlapped and each phase scans nonzero sub-matrices one by one in block row-wise order without access conflicts to sum memories. The sum memories store not only variable node sums but also prior messages. In this case, it saves an additional FIFO of 13 440 bits. The decoder attains 248-287 Mb/s at 150 MHz and 15 iterations.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131184070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A New approach in on-line task scheduling for reconfigurable computing systems 可重构计算系统在线任务调度的新方法
M. M. Bassiri, H. Shahhoseini
{"title":"A New approach in on-line task scheduling for reconfigurable computing systems","authors":"M. M. Bassiri, H. Shahhoseini","doi":"10.1109/ASAP.2010.5540975","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540975","url":null,"abstract":"Reconfiguration overhead is an important obstacle that limits the performance of on-line scheduling algorithms in reconfigurable computing systems and increases the overall execution time. Configuration reusing (task reusing) can decrease reconfiguration overhead considerably, particularly in periodic applications. In this paper, we present a new approach for on-line scheduling and placement in which configuration reusing is considered as a main characteristic in order to reduce reconfiguration overhead and decrease total execution time of the tasks. A large variety of experiments have been conducted on the proposed algorithm. Obtained results show considerable improvement in overall execution time of the tasks.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132647240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A GALS FFT processor with clock modulation for low-EMI applications 具有时钟调制的低电磁干扰应用的GALS FFT处理器
Xin Fan, M. Krstic, C. Wolf, E. Grass
{"title":"A GALS FFT processor with clock modulation for low-EMI applications","authors":"Xin Fan, M. Krstic, C. Wolf, E. Grass","doi":"10.1109/ASAP.2010.5541014","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5541014","url":null,"abstract":"With the growth in complexity of digital CMOS circuits, the steep current fluctuations introduced by numerous transistors switching with clock signals are proven to be a significant source of electromagnetic interference (EMI). In recent years the reduction in EMI noise from high speed digital ICs has already gained intensive research attention. In this paper the pausible clocking based globally asynchronous locally synchronous (GALS) design with phase and frequency modulation on the locally generated clocks is proposed as a systematic solution to EMI reduction. As a practical example, a 64-point Radix-23 pipelined GALS FFT processor was implemented using the IHP 130nm CMOS technology for low-EMI applications. The on-chip measurements demonstrate 13dB attenuation at the clock fundamental frequency and more than 20dB attenuation at higher clock harmonics, in comparison with the synchronous design.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133745808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A forwarding-sensitive instruction scheduling approach to reduce register file constraints in VLIW architectures 一种减少VLIW体系结构中寄存器文件约束的转发敏感指令调度方法
G. P. Vayá, J. Martín-Langerwerf, H. Blume, P. Pirsch
{"title":"A forwarding-sensitive instruction scheduling approach to reduce register file constraints in VLIW architectures","authors":"G. P. Vayá, J. Martín-Langerwerf, H. Blume, P. Pirsch","doi":"10.1109/ASAP.2010.5541015","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5541015","url":null,"abstract":"This paper presents a forwarding-based approach to increase the code compaction and consequently the processing performance of VLIW media-processors that implement monolithic or partitioned register file (RF) organizations with reduced number of read/write ports. This approach exploits the forwarding mechanism implemented in common pipelined VLIW architectures to reduce the number of RF accesses, which is one of the main limiting factors of the code compaction process. This RF access reduction enables a higher instruction scheduling efficiency and eventually decreases the power consumption, without requiring extra hardware. A forwarding-sensitive code generation algorithm based on an enhanced list scheduling algorithm is described in detail. In addition, three case studies are presented, where the proposed scheduling algorithm leads to performance improvements of up to 8.4% when running common image and video codec tasks on a generic VLIW architecture. This is attractively close to the maximum performance improvement (11.4%) that can be achieved when investing in hardware by using a RF with twice the number of ports.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131996449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Memoryless RNS-to-binary converters for the {2n+1 - 1, 2n, 2n - 1} moduli set 用于{2n+1 - 1,2n, 2n - 1}模集的无内存rs -二进制转换器
K. Gbolagade, G. Voicu, S. Cotofana
{"title":"Memoryless RNS-to-binary converters for the {2n+1 - 1, 2n, 2n - 1} moduli set","authors":"K. Gbolagade, G. Voicu, S. Cotofana","doi":"10.1109/ASAP.2010.5540979","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540979","url":null,"abstract":"In this paper, we propose two novel memoryless reverse converters for the moduli set {2n+1 – 1,2n, 2n – 1}. The first proposed converter does not entirely cover the dynamic range while the second proposed converter covers the entire dynamic range. First, we simplify the Chinese Remainder Theorem in order to obtain a reverse converter that utilizes mod-(2n+1 – 1) operation. Second, we further reduce the resulting architecture to obtain a reverse converter that uses only carry save adders and carry propagate adders. FPGA implementation results indicate that, on average, the proposed limited dynamic range converter achieves about 42% area reduction. However, the second proposed converter provides only 29.48% area reduction when compared with the most effective equivalent state of the art converter. Both of the proposed converters also exhibit a small speed improvement over the state of the art equivalent converter.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117311407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Dynamic code mapping for limited local memory systems 有限的本地内存系统的动态代码映射
S. Jung, Aviral Shrivastava, Ke Bai
{"title":"Dynamic code mapping for limited local memory systems","authors":"S. Jung, Aviral Shrivastava, Ke Bai","doi":"10.1109/ASAP.2010.5540773","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540773","url":null,"abstract":"This paper presents heuristics for dynamic management of application code on limited local memories present in high-performance multi-core processors. Previous techniques formulate the problem using call graphs, which do not capture the temporal ordering of functions. In addition, they only use a conservative estimate of the interference cost between functions to obtain a mapping. As a result previous techniques are unable to achieve efficient code mapping. Techniques proposed in this paper overcome both these limitations and achieve superior code mapping. Experimental results from executing benchmarks from MiBench onto the Cell processor in the Sony Playstation 3 demonstrate up to 29% and average 12% performance improvement, at tolerable compile-time overhead.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123334452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
A formal specification of fault-tolerance in prospecting asteroid mission with Reactive Autonomie Systems Framework 基于反应自主系统框架的小行星勘探任务容错规范
Heng Kuang, O. Ormandjieva, S. Klasa, J. Bentahar
{"title":"A formal specification of fault-tolerance in prospecting asteroid mission with Reactive Autonomie Systems Framework","authors":"Heng Kuang, O. Ormandjieva, S. Klasa, J. Bentahar","doi":"10.1109/ASAP.2010.5540769","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540769","url":null,"abstract":"The NASA's Autonomous Nano Technology Swarm (ANTS) is a generic mission architecture consisting of miniaturized, autonomous, self-similar, reconfigurable, and addressable components forming structures. The Prospecting Asteroid Mission (PAM) is one of ANTS applications for survey of large dynamic populations. In this paper, we propose a formal approach based on Category Theory to specify the fault-tolerance property in PAM by Reactive Autonomie Systems Framework.","PeriodicalId":175846,"journal":{"name":"ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123612733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信