2008 International Conference on Application-Specific Systems, Architectures and Processors最新文献

筛选
英文 中文
Rapid estimation of instruction cache hit rates using loop profiling 使用循环分析快速估计指令缓存命中率
Santanu Kumar Dash, T. Srikanthan
{"title":"Rapid estimation of instruction cache hit rates using loop profiling","authors":"Santanu Kumar Dash, T. Srikanthan","doi":"10.1109/ASAP.2008.4580189","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580189","url":null,"abstract":"Estimation of the hit rate curve for an application is the first step in application specific cache tuning. Several techniques have been proposed to meet this objective however most of these have dealt with the data cache with little attention to the instruction cache. In this paper, we propose a novel, lightweight and highly scalable technique for rapid estimation of the instruction cache hit rate curve for a given application. Our technique works at the basic block level and relies on a one-time loop profiling of the weighted control flow graph of the application followed by estimation of the hit rate for different cache sizes. It accounts for the spatial and temporal locality separately and is sensitive to the cache size as well as block size. The proposed technique is highly accurate and when compared with results from an actual cache simulator, the mean error in estimation ranged from 1.11% to 2.46% for the benchmarks tested.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124900184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mapping of the AES cryptographic algorithm on a Coarse-Grain reconfigurable array processor AES加密算法在粗粒度可重构阵列处理器上的映射
Andres Garcia, Mladen Berekovic, T. Aa
{"title":"Mapping of the AES cryptographic algorithm on a Coarse-Grain reconfigurable array processor","authors":"Andres Garcia, Mladen Berekovic, T. Aa","doi":"10.1109/ASAP.2008.4580186","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580186","url":null,"abstract":"Coarse-Grained reconfigurable architectures are emerging as potential candidates to meet the high performance, power efficiency and flexibility needed by embedded systems. ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) and its DRESC compiler offer a very promising platform for designing embedded systems targeted for different application domains. We present a procedure for mapping the widely used AES cryptographic algorithm on ADRES. A detailed explanation is shown for each of the optimizations performed in order to make better use of instruction and loop parallelism. A new intrinsic function set is proposed for speeding up the processing of the AES algorithm. The obtained simulation results are compared with experiments done on the widely known Texas Instruments DSP: TI C64x, which is considered state-of-the-art for embedded systems. Our results show that ADRES outperforms TI C64x DSP, executing the AES algorithm in one fourth of the cycles.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114501838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Novel approach on lifting-based DWT and IDWT processor with multi-context configuration to support different wavelet filters 基于提升的小波变换和多上下文配置的IDWT处理器,以支持不同的小波滤波器
A. Guntoro, M. Glesner
{"title":"Novel approach on lifting-based DWT and IDWT processor with multi-context configuration to support different wavelet filters","authors":"A. Guntoro, M. Glesner","doi":"10.1109/ASAP.2008.4580195","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580195","url":null,"abstract":"In this paper, we propose a lifting-based DWT processor that can perform various forward and inverse transforms. Contrary to other lifting-based processors which focus on JPEG2000, our design is based on the fact that the wavelet transformations are not used only in the area of image processing and wavelet filters may not be represented as integer numbers. The proposed architecture is based on NxM processing elements which require only one multiplier and one adder to perform prediction/update on a continuous data stream. The multi-context feature allows the processor to be configured for different types of transformations in a simple manner.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"628 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116341543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
New insights on Ling adders 对凌加德的新见解
Álvaro Vázquez, E. Antelo
{"title":"New insights on Ling adders","authors":"Álvaro Vázquez, E. Antelo","doi":"10.1109/ASAP.2008.4580183","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580183","url":null,"abstract":"Adders are critical for microprocessor design. Current designs use variations of parallel prefix schemes. A method introduced by Ling [7] may improve this kind of adders. However, as recent research publications demonstrate, the use of the Ling scheme in prefix adders is not a mature and clear concept. In this work we show how to easily extend any existing prefix adder topology to use the Ling method. Moreover, we use this methodology to implement the Ling scheme in a flagged prefix adder, which is an interesting building block for floating point units.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121524799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Resource efficient generators for the floating-point uniform and exponential distributions 浮点均匀分布和指数分布的资源高效生成器
David B. Thomas, W. Luk
{"title":"Resource efficient generators for the floating-point uniform and exponential distributions","authors":"David B. Thomas, W. Luk","doi":"10.1109/ASAP.2008.4580162","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580162","url":null,"abstract":"Monte-Carlo simulations and many other stochastic algorithms are almost ideal applications for FPGAs, as the huge amount of available parallelism allows deep pipelining without loop-carried dependencies and spatial scaling across large devices without shared resource bottlenecks. Another key advantage is that random number generation is very cheap (when compared to software), and can be tailored to meet the performance and quality needs of each application. However, in many cases this advantage is not exploited, either because an inefficient but simple to implement generator is chosen, or because a generator with properties that far exceed the needs of the application is used. This paper describes generators for the floating-point uniform and exponential distributions, which provide efficient resource usage, while remaining sufficiently simple to make them attractive to users.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125626526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Fault-tolerant dynamically reconfigurable NoC-based SoC 容错动态可重构的基于noc的SoC
Mohammad Hosseinabady, J. Núñez-Yáñez
{"title":"Fault-tolerant dynamically reconfigurable NoC-based SoC","authors":"Mohammad Hosseinabady, J. Núñez-Yáñez","doi":"10.1109/ASAP.2008.4580150","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580150","url":null,"abstract":"This paper proposes a network-on-chip (NoC)-based dynamically reconfigurable platform which can perform multiple applications, simultaneously. A tile attached to a router in the NoC consists of a core container which can host a core permanently or temporarily. The tile also has a hardwired controller and a cache like memory to control the hosted cores. A core, which runs a task, may be described by a bitstream (called hardware core) or a programme code (called software core). Because of the dynamic behaviour of the proposed platform, using task identifier, a stochastic dynamic routing algorithm will find (or map) the task in the platform. Because of using the task identifier in routing algorithm and the reconfigurability of tiles, the proposed platform can tolerate probable faults. The proposed SoC architecture is easily able to run new protocols and tasks. Our results show that, the proposed platform follows the user interests such that runs tasks with higher temporal locality much faster than the tasks with lower temporal locality.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"37 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131457369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
RECONNECT: A NoC for polymorphic ASICs using a low overhead single cycle router RECONNECT:使用低开销单周期路由器的多态asic的NoC
Joseph Nimmy, C. R. Reddy, Keshavan Varadarajan, M. Alle, Alexander Fell, S. Nandy, R. Narayan
{"title":"RECONNECT: A NoC for polymorphic ASICs using a low overhead single cycle router","authors":"Joseph Nimmy, C. R. Reddy, Keshavan Varadarajan, M. Alle, Alexander Fell, S. Nandy, R. Narayan","doi":"10.1109/ASAP.2008.4580187","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580187","url":null,"abstract":"A polymorphic ASIC is a runtime reconfigurable hardware substrate comprising compute and communication elements. It is a ldquofuture proofrdquo custom hardware solution for multiple applications and their derivatives in a domain. Interoperability between application derivatives at runtime is achieved through hardware reconfiguration. In this paper we present the design of a single cycle Network on Chip (NoC) router that is responsible for effecting runtime reconfiguration of the hardware substrate. The router design is optimized to avoid FIFO buffers at the input port and loop back at output crossbar. It provides virtual channels to emulate a non-blocking network and supports a simple X-Y relative addressing scheme to limit the control overhead to 9 bits per packet. The 8times8 honeycomb NoC (RECONNECT) implemented in 130 nm UMC CMOS standard cell library operates at 500 MHz and has a bisection bandwidth of 28.5 GBps. The network is characterized for random, self-similar and application specific traffic patterns that model the execution of multimedia and DSP kernels with varying network loads and virtual channels. Our implementation with 4 virtual channels has an average network latency of 24 clock cycles and throughput of 62.5% of the network capacity for random traffic. For application specific traffic the latency is 6 clock cycles and throughput is 87% of the network capacity.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127393301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Extending the SIMPPL SoC architectural framework to support application-specific architectures on multi-FPGA platforms 扩展SIMPPL SoC架构框架,以支持多fpga平台上的特定应用架构
David Dickin, Lesley Shannon
{"title":"Extending the SIMPPL SoC architectural framework to support application-specific architectures on multi-FPGA platforms","authors":"David Dickin, Lesley Shannon","doi":"10.1109/ASAP.2008.4580156","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580156","url":null,"abstract":"Process technology has reduced in size such that it is possible to implement complete application-specific architectures as systems-on-chip (SoCs) using both application-specific integrated circuits (ASICs) and field programmable gate arrays (FPGAs). However, the reconfigurable nature of an FPGA results in lower logic density, such that large, complex applications require multi-FPGA implementation platforms. Although designing SoCs is challenging, SoC models such as systems integrating modules with predefined physical links (SIMPPL) exist to facilitate the design process. SIMPPL leverages defined physical interfaces and communication protocols to enable rapid system-level integration for application-specific architectures. This paper presents a ldquoSIMPPL repeaterrdquo that enables the SIMPPL SoC architectural framework to be used for systems spanning multiple FPGAs. The SIMPPL repeater abstracts inter-chip communication, allowing designers to treat a multi-FPGA platform as a single large reconfigurable fabric and focus on their application-specific architecture.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"14 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130059712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Integer and floating-point constant multipliers for FPGAs 用于fpga的整数和浮点常数乘法器
N. Brisebarre, F. D. Dinechin, J. Muller
{"title":"Integer and floating-point constant multipliers for FPGAs","authors":"N. Brisebarre, F. D. Dinechin, J. Muller","doi":"10.1109/ASAP.2008.4580184","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580184","url":null,"abstract":"Reconfigurable circuits now have a capacity that allows them to be used as floating-point accelerators. They offer massive parallelism, but also the opportunity to design optimised floating-point hardware operators not available in microprocessors. Multiplication by a constant is an important example of such an operator. This article presents an architecture generator for the correctly rounded multiplication of a floating-point number by a constant. This constant can be a floating-point value, but also an arbitrary irrational number. The multiplication of the significands is an instance of the well-studied problem of constant integer multiplication, for which improvement to existing algorithms are also proposed and evaluated.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125591744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Fully-pipelined efficient architectures for FPGA realization of discrete Hadamard transform FPGA实现离散Hadamard变换的全流水线高效架构
P. Meher, J. Patra
{"title":"Fully-pipelined efficient architectures for FPGA realization of discrete Hadamard transform","authors":"P. Meher, J. Patra","doi":"10.1109/ASAP.2008.4580152","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580152","url":null,"abstract":"Fully-pipelined simple modular structures are presented in this paper for efficient hardware realization of discrete Hadamard transform (HT). From the kernel matrix of HT, we have derived four different pipelined modular designs for transform length N = 4. It is shown further that the HT of transform-length N = 8 can be obtained from two 4-point HT modules, and similarly, the HT of transform-length N=16 can be obtained from four 4-point HT modules. Long-length transforms may, however, be computed from these short-length modules as N-point transforms can be computed from 2M number of M point HT-modules, where M = N1/2. The proposed architectures are coded in VHDL, simulated by Xilinx ISE tool for validation and testing; and synthesized thereafter to be implemented in different FPGA devices, e.g., Virtex-E, Virtex-II Pro and Virtex-4. From the synthesis result, it is found that the proposed designs involve considerably less number of slices and provide significantly higher best-achievable-frequency compared with the existing architectures for FPGA implementation of HT.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126076475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信