2008 International Conference on Application-Specific Systems, Architectures and Processors最新文献

筛选
英文 中文
Buffer allocation for advanced packet segmentation in Network Processors 网络处理器中用于高级数据包分段的缓冲区分配
Daniel Llorente, Kimon Karras, Thomas Wild, A. Herkersdorf
{"title":"Buffer allocation for advanced packet segmentation in Network Processors","authors":"Daniel Llorente, Kimon Karras, Thomas Wild, A. Herkersdorf","doi":"10.1109/ASAP.2008.4580182","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580182","url":null,"abstract":"In current network processors, incoming variable-length packets are sliced using only one small segment size and then stored in the buffer. Inconveniently, short data bursts are inadequate for accessing SDRAM, commonly used for packet buffers, due to high activation and pre-charging latencies. Using large segment sizes is not optimal either because though it increases memory bandwidth, the benefit comes at the price of a heavy reduction in storing efficiency. A good solution to achieve simultaneously high performance and memory utilization consists in storing a single packet segmented using multiple segment sizes. In this paper, we study how to allocate memory for these different-sized segments in an efficient way. First we analyze the appropriate segment pool size for a multitude of traffic scenarios. Our experiments show that simple static buffer allocation does not always suffice as different segment pools may be exhausted depending on traffic. Hence we introduce a method for handling multiple segment pools not only in a static but also in a dynamic way, taking advantage of a new set of control structures based on a combination of bitmaps and linked lists. We demonstrate that our method achieves a huge reduction in control buffer size requirements in comparison to state-of-the-art control structures, together with decreasing the average number of accesses to control data.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134252833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An efficient method for evaluating polynomial and rational function approximations 一个评估多项式和有理函数近似的有效方法
N. Brisebarre, S. Chevillard, M. Ercegovac, J. Muller, S. Torres
{"title":"An efficient method for evaluating polynomial and rational function approximations","authors":"N. Brisebarre, S. Chevillard, M. Ercegovac, J. Muller, S. Torres","doi":"10.1109/ASAP.2008.4580185","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580185","url":null,"abstract":"In this paper we extend the domain of applicability of the E-method [7, 8], as a hardware-oriented method for evaluating elementary functions using polynomial and rational function approximations. The polynomials and rational functions are computed by solving a system of linear equations using digit-serial iterations on simple and highly regular hardware. For convergence, these systems must be diagonally dominant. The E-method offers an efficient way for the fixed-point evaluation of polynomials and rational functions if their coefficients conform to the diagonal dominance condition. Until now, there was no systematic approach to obtain good approximations to f over an interval [a, b] by rational functions satisfying the constraints required by the E-method. In this paper, we present such an approach which is based on linear programming and lattice basis reduction. We also discuss a design and performance characteristics of a corresponding implementation.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"881 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132900394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
FPGA-based hardware accelerator of the heat equation with applications on infrared thermography 基于fpga的热方程硬件加速器及其在红外热成像中的应用
F. Pardo, Paula López Martínez, D. Cabello
{"title":"FPGA-based hardware accelerator of the heat equation with applications on infrared thermography","authors":"F. Pardo, Paula López Martínez, D. Cabello","doi":"10.1109/ASAP.2008.4580175","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580175","url":null,"abstract":"Modelling of physical phenomena often involves the use of complex systems of equations whose computational solution has demanding requirements in terms of memory and computing power. Among the different techniques proposed, the Finite-Difference Time-Domain (FD-TD) method has the advantage of a feasible hardware implementation that can significantly speed up the computations. This technique is widely used for the solution of partial differential equations in a variety of areas such as antennas design, medical studies, circuit packaging and non-destructive evaluation. In this paper, we present a hardware accelerator of a 3D FD-TD heat equation solver that constitutes the basis of a thermal model of the soil for the non-destructive evaluation of minefields using infrared thermography techniques. In order to be able to work on the field during mine removal activities, a portable and computationally efficient system must be achieved. To this aim, we projected the 3D FD-TD model of the soil onto an FPGA platform using Handel-C and VHDL. A speedup factor of 34 over a single precision PC (C++) is achieved.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124318388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A parallel hardware architecture for connected component labeling based on fast label merging 一种基于快速标签合并的连接组件标记并行硬件体系结构
Holger Flatt, Steffen Blume, Sebastian Hesselbarth, Torsten Schünemann, P. Pirsch
{"title":"A parallel hardware architecture for connected component labeling based on fast label merging","authors":"Holger Flatt, Steffen Blume, Sebastian Hesselbarth, Torsten Schünemann, P. Pirsch","doi":"10.1109/ASAP.2008.4580169","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580169","url":null,"abstract":"This paper presents a dedicated parallel hardware architecture for fast connected component labeling. Both, label generation and merging of equivalent labels are accelerated. Label generation is performed for four pixels in parallel. A special linked list based approach for fast label merging is proposed. This results in a compact implementation and shorter processing times compared to published implementations. For prototyping and evaluation purposes, the hardware architecture was integrated into an FPGA-based modular coprocessor architecture. A binary D1 test image is labeled in 1.74 ms on a Virtex-II Pro FPGA running at 140 MHz. Moreover, the architecture can be easily integrated into embedded image processing systems.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123243655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
A subsampling pulsed UWB demodulator based on a flexible complex SVD 一种基于柔性复奇异值分解的次采样脉冲超宽带解调器
Y. Vanderperren, W. Dehaene
{"title":"A subsampling pulsed UWB demodulator based on a flexible complex SVD","authors":"Y. Vanderperren, W. Dehaene","doi":"10.1109/ASAP.2008.4580164","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580164","url":null,"abstract":"A flexible digital architecture for a pulsed ultra-wideband demodulator sampling below Nyquist rate is presented. The system is based on a complex Singular Value Decomposition implemented on a configurable systolic array of simple processors. Automatic code generation is applied to cut design time and rapidly assess the implementation cost of several architectures of the processors.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124472075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zodiac: System architecture implementation for a high-performance Network Security Processor 用于高性能网络安全处理器的系统架构实现
Wang Haixin, Bai Guoqiang, C. Hongyi
{"title":"Zodiac: System architecture implementation for a high-performance Network Security Processor","authors":"Wang Haixin, Bai Guoqiang, C. Hongyi","doi":"10.1109/ASAP.2008.4580160","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580160","url":null,"abstract":"The last few years have seen many significant progresses in the field of application-specific processors. One exemplar is Network Security Processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from Network Processors (NPs). This paper proposes a high-performance NSP intended for both IPSec and SSL protocols acceleration. With a programmable descriptor-based instruction set architecture, the novel design of system architecture leads to a Gbps rate NSP named Zodiac, which is programmable with domain specific instructions for Gbps throughput IPSec and SSL applications. Synthesized with a 0.18 mum CMOS technology, the peak throughput of IPSec ESP tunnel mode can reach up to 1.651 Gbps and over 1000 full SSL handshakes per second are attainable.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115819329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Accelerating Nussinov RNA secondary structure prediction with systolic arrays on FPGAs 基于fpga的收缩阵列加速Nussinov RNA二级结构预测
A. Jacob, J. Buhler, R. Chamberlain
{"title":"Accelerating Nussinov RNA secondary structure prediction with systolic arrays on FPGAs","authors":"A. Jacob, J. Buhler, R. Chamberlain","doi":"10.1109/ASAP.2008.4580177","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580177","url":null,"abstract":"RNA structure prediction, or folding, is a compute-intensive task that lies at the core of several search applications in bioinformatics. We begin to address the need for high-throughput RNA folding by accelerating the Nussinov folding algorithm using a 2D systolic array architecture. We adapt classic results on parallel string parenthesization to produce efficient systolic arrays for the Nussinov algorithm, elaborating these array designs to produce fully realized FPGA implementations. Our designs achieve estimated speedups up to 39times on a Xilinx Virtex-II 6000 FPGA over a modern x86 CPU.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129419146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Floating point multiplication rounding schemes for interval arithmetic 区间算术的浮点乘法舍入方案
A. Amaricai, M. Vladutiu, M. Udrescu, L. Prodan, O. Boncalo
{"title":"Floating point multiplication rounding schemes for interval arithmetic","authors":"A. Amaricai, M. Vladutiu, M. Udrescu, L. Prodan, O. Boncalo","doi":"10.1109/ASAP.2008.4580148","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580148","url":null,"abstract":"Floating point multipliers with two differently rounded results for the same operation can be used for increasing the performance of interval multiplication. The present paper stands by this idea, by investigating the idea of using three existing floating point multiplication rounding algorithms for such multipliers - the Even-Seidel, Quach and Yu-Zyner algorithms. These three rounding schemes are modified for interval arithmetic; furthermore, a new rounding scheme is proposed. The estimates rendered by our analysis show that the proposed scheme has the best performance/area ratio.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129590417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Architecture and VLSI realization of a high-speed programmable decoder for LDPC convolutional codes LDPC卷积码高速可编程解码器的结构与VLSI实现
M. Tavares, S. Kunze, E. Matús, G. Fettweis
{"title":"Architecture and VLSI realization of a high-speed programmable decoder for LDPC convolutional codes","authors":"M. Tavares, S. Kunze, E. Matús, G. Fettweis","doi":"10.1109/ASAP.2008.4580181","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580181","url":null,"abstract":"In this paper, we present a novel high-speed dual-core programmable decoder architecture for LDPC convolutional codes and their tail-biting versions. This architecture uses a modified Min-Sum algorithm and enables the decoding of a multitude of codes with different node degree distributions, rates and block lengths. We show how the parallelization concepts are derived using the properties of the bipartite graphs underlying the codes. Moreover, the hardware elements composing the architecture will be presented and analyzed in detail. The programmability of the decoder is also considered. Finally, we present the synthesis results for a prototype ASIC which is capable of achieving high decoding throughput still with very high flexibility, relatively low power consumption and small area.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129043040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Operation shuffling over cycle boundaries for low energy L0 clustering 低能量L0聚类的循环边界操作变换
Yuki Kobayashi, M. Jayapala, P. Raghavan, F. Catthoor, M. Imai
{"title":"Operation shuffling over cycle boundaries for low energy L0 clustering","authors":"Yuki Kobayashi, M. Jayapala, P. Raghavan, F. Catthoor, M. Imai","doi":"10.1109/ASAP.2008.4580170","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580170","url":null,"abstract":"To achieve energy reduction for instruction memory access in VLIW ASIPs, operation shuffling technique has been proposed. The shuffling technique changes assignment of an operation to different slot so that L0 cluster configuration can be improved. The published technique, however, moves operations within a cycle, not between cycles. As a result, the potential gain of energy reduction was limited. This paper proposes a shuffling technique that also moves operations between cycles as well as within a cycle. The experimental results show that the proposed method achieves more efficient energy than the best known shuffling method by up to 15.3% in the best case.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"313 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128310939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信