2008 International Conference on Application-Specific Systems, Architectures and Processors最新文献

筛选
英文 中文
Loop-oriented metrics for exploring an application-specific architecture design-space 用于探索特定于应用程序的体系结构设计空间的面向循环的度量
M. Mbaye, N. Bélanger, Y. Savaria, S. Pierre
{"title":"Loop-oriented metrics for exploring an application-specific architecture design-space","authors":"M. Mbaye, N. Bélanger, Y. Savaria, S. Pierre","doi":"10.1109/ASAP.2008.4580188","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580188","url":null,"abstract":"Since ASIPs were introduced in the HW/SW architecture design space, application partitioning has become more complex. Designers have more ways to accelerate applications: with ASIPs of various kinds or with dedicated hardware modules. In this paper, we present loop-oriented metrics that will be used during design-space exploration for the partitioning process of C-based designs. These metrics help designers determine which aspect of loop iterations, between data memory accesses and ALU/Control operations, offers more acceleration potential. We implemented a profiler-scheduler LOOPPROF that gathers the metrics. Our tool also helps determine which optimization techniques such as data reuse are suitable for the considered code segments. We demonstrate the use of our tool by exploring the acceleration possibilities of the ELA Deinterlacer, a video processing algorithm.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129605016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reconfigurable acceleration of microphone array algorithms for speech enhancement 用于语音增强的可重构加速麦克风阵列算法
K. Yiu, C. H. Ho, N. Grbic, Yao Lu, Xiaoxiang Shi, W. Luk
{"title":"Reconfigurable acceleration of microphone array algorithms for speech enhancement","authors":"K. Yiu, C. H. Ho, N. Grbic, Yao Lu, Xiaoxiang Shi, W. Luk","doi":"10.1109/ASAP.2008.4580179","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580179","url":null,"abstract":"Microphone arrays play an important role in noise reduction and speech enhancement. Their algorithms are based on beamforming, which reduces the level of localized and ambient noise signals while minimizing distortion to speech from the desired direction via spatial filtering. This paper describes a class of subband beamforming algorithms. The similarity between different algorithms is discussed. To enhance computational efficiency, the algorithms are implemented in frequency domain. A hardware architecture, with bitwidth optimization, is proposed to support the algorithms. An implementation with 7 instances on a Xilinx XC4VSX55 FPGA at 175 MHz can run 41.7 times faster than the corresponding pure software implementation on a 3.2 GHz Pentium 4 PC.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116111623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Architecture of a polymorphic ASIC for interoperability across multi-mode H.264 decoders 多模H.264解码器互操作性的多态ASIC架构
Adarsha Rao, M. Alle, S. Nandy, R. Narayan
{"title":"Architecture of a polymorphic ASIC for interoperability across multi-mode H.264 decoders","authors":"Adarsha Rao, M. Alle, S. Nandy, R. Narayan","doi":"10.1109/ASAP.2008.4580193","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580193","url":null,"abstract":"Run-time interoperability between different applications based on H.264/AVC is an emerging need in networked infotainment, where media delivery must match the desired resolution and quality of the end terminals. In this paper, we describe the architecture and design of a polymorphic ASIC to support this. The H.264 decoding flow is partitioned into modules, such that the polymorphic ASIC meets the design goals of low-power, low-area, high flexibility, high throughput and fast interoperability between different profiles and levels of H.264. We demonstrate the idea with a multi-mode decoder that can decode baseline, main and high profile H.264 streams and can interoperate at run.time across these profiles. The decoder is capable of processing frame sizes of up to 1024 times 768 at 30 fps. The design synthesized with UMC 0.13 mum technology, occupies 250 k gates and runs at 100 MHz.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133000840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dynamic holographic reconfiguration on a four-context ODRGA 四上下文ODRGA上的动态全息重构
M. Nakajima, Minoru Watanabe
{"title":"Dynamic holographic reconfiguration on a four-context ODRGA","authors":"M. Nakajima, Minoru Watanabe","doi":"10.1109/ASAP.2008.4580174","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580174","url":null,"abstract":"Reconfiguration applications based on reconfigurable devices present new computational paradigms because, by increasing the reconfiguration frequency of reconfigurable devices, their activity and performance can be improved dramatically. Recently, optically reconfigurable gate arrays (ORGAs) with a holographic memory have been developed to realize rapid reconfigurations and numerous reconfiguration contexts. In addition, optically differential reconfigurable gate arrays (ODRGAs) have been developed to accelerate optical reconfigurations of conventional ORGAs. However, fast configuration experiments under multiple contexts exploiting the ODRGA architecture have never been reported. Therefore, this paper presents a four-context ODRGA system and experimental results that elucidate aspects of fast reconfiguration. The advantage of the ODRGA architecture is discussed based on those results.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133106294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamically reconfigurable regular expression matching architecture 动态可重构正则表达式匹配体系结构
J. Divyasree, H. Rajashekar, Kuruvilla Varghese
{"title":"Dynamically reconfigurable regular expression matching architecture","authors":"J. Divyasree, H. Rajashekar, Kuruvilla Varghese","doi":"10.1109/ASAP.2008.4580165","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580165","url":null,"abstract":"Regular Expressions are generic representations for a string or a collection of strings. This paper focuses on implementation of a regular expression matching architecture on reconfigurable fabric like FPGA. We present a Non-deterministic Finite Automata based implementation with extended regular expression syntax set compared to previous approaches. We also describe a dynamically reconfigurable generic block that implements the supported regular expression syntax. This enables formation of the regular expression hardware by a simple cascade of generic blocks as well as a possibility for reconfiguring the generic blocks to change the regular expression being matched. Further, we have developed an HDL code generator to obtain the VHDL description of the hardware for any regular expression set. Our optimized regular expression engine achieves a throughput of 2.45 Gbps. Our dynamically reconfigurable regular expression engine achieves a throughput of 0.8 Gbps using 12 FPGA slices per generic block on Xilinx Virtex2Pro FPGA.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128430900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
PERMAP: A performance-aware mapping for application-specific SoCs PERMAP:针对特定于应用程序的soc的性能感知映射
A. E. Kiasari, S. Hessabi, H. Sarbazi-Azad
{"title":"PERMAP: A performance-aware mapping for application-specific SoCs","authors":"A. E. Kiasari, S. Hessabi, H. Sarbazi-Azad","doi":"10.1109/ASAP.2008.4580157","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580157","url":null,"abstract":"Future system-on-chip (SoC) designs will need efficient on-chip communication architectures that can provide efficient and scalable data transport among the intellectual properties (IPs). Designing and optimizing SoCs is an increasingly difficult task due to the size and complexity of the SoC design space, high cost of detailed simulation, and several constraints that the design must satisfy. For efficient design of SoCs, an efficient mapping of IPs onto networks-on-chip (NoCs) is highly desirable. Towards this end, we have presented PERMAP, a performance-aware mapping algorithm which maps the IPs onto a generic NoC architecture such that the average communication delay is minimized. This is accomplished by a performance analytical model which can be used for any arbitrary network topology with wormhole routing. The algorithm is used for mapping a video application onto a tile-based NoC and experimental results show that PERMAP is fast and robust.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"440 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125777870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
An FPGA architecture for CABAC decoding in manycore systems 多核系统中CABAC解码的FPGA结构
R. Osorio, J. Bruguera
{"title":"An FPGA architecture for CABAC decoding in manycore systems","authors":"R. Osorio, J. Bruguera","doi":"10.1109/ASAP.2008.4580194","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580194","url":null,"abstract":"Arithmetic coding is an efficient entropy compression method that achieves results close to the entropy limit and it is used in modern standards such as JPEG-2000 and H.264. Arithmetic decoding (AD) in H.264 video coding standard is a sequential task that takes a significant part of computing time. In present and future multicore and manycore systems, AD becomes a bottleneck as it cannot be parallelized, limiting the concurrent execution of other tasks. In this paper, an FPGA-based accelerator is proposed to speed-up AD in H.264 and enable parallel decoding at macroblock and frame levels scaling up to tens or hundreds of cores.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131995450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Design space exploration of a cooperative MIMO receiver for reconfigurable architectures 面向可重构结构的协同MIMO接收机设计空间探索
Shahnam Mirzaei, A. Irturk, R. Kastner, B. T. Weals, R. Cagley
{"title":"Design space exploration of a cooperative MIMO receiver for reconfigurable architectures","authors":"Shahnam Mirzaei, A. Irturk, R. Kastner, B. T. Weals, R. Cagley","doi":"10.1109/ASAP.2008.4580173","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580173","url":null,"abstract":"Cooperative MIMO is a new technique that allows disjoint wireless communication nodes (e.g. wireless sensors) to form a virtual antenna array to increase bandwidth, reliability and/or transmission distance. It differs fundamentally from other MIMO communication since the signals received from each node have a relative timing and frequency offset due to the distributed nature of their transmitting antennas. Therefore, the receiver must estimate the timing and frequency for each transmitting node, in addition to the MIMO channel. In this paper, we design and implement a receiver for the cooperative MIMO problem using reconfigurable hardware. We discuss the computation required for each stage of the receiver and perform experimental study of the tradeoffs between area, power, performance and quality of results. The end result is an efficient, parameterizable, cooperative MIMO receiver implemented on several different state-of-the-art FPGAs devices.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116149708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Bit matrix multiplication in commodity processors 商用处理器中的位矩阵乘法
Y. Hilewitz, C. Lauradoux, R. Lee
{"title":"Bit matrix multiplication in commodity processors","authors":"Y. Hilewitz, C. Lauradoux, R. Lee","doi":"10.1109/ASAP.2008.4580146","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580146","url":null,"abstract":"Registers in processors generally contain words or, with the addition of multimedia extensions, short vectors of subwords of bytes or 16-bit elements. In this paper, we view the contents of registers as vectors or matrices of individual bits. However, the facility to operate efficiently on the bit-level is generally lacking. A commodity processor usually only has logical and shift instructions and occasionally population count instructions. Perhaps the most powerful primitive bit-level operation is the bit matrix multiply (BMM) instruction, currently found only in supercomputers like Cray. This instruction multiplies two ntimesn bit matrices. In this paper, we show the power of BMM. We propose and analyze new processor instructions that implement simpler BMM primitive operations more suitable for a commodity processor. We show the impact of BMM on the performance of critical application kernels and discuss its hardware cost.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127595321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Low discrepancy sequences for Monte Carlo simulations on reconfigurable platforms 可重构平台上蒙特卡罗模拟的低差异序列
I. Dalal, D. Stefan, J. Harwayne-Gidansky
{"title":"Low discrepancy sequences for Monte Carlo simulations on reconfigurable platforms","authors":"I. Dalal, D. Stefan, J. Harwayne-Gidansky","doi":"10.1109/ASAP.2008.4580163","DOIUrl":"https://doi.org/10.1109/ASAP.2008.4580163","url":null,"abstract":"Low-discrepancy sequences, also known as ldquoquasi-randomrdquo sequences, are numbers that are better equidistributed in a given volume than pseudo-random numbers. Evaluation of high-dimensional integrals is commonly required in scientific fields as well as other areas (such as finance), and is performed by stochastic Monte Carlo simulations. Simulations which use quasi-random numbers can achieve faster convergence and better accuracy than simulations using conventional pseudo-random numbers. Such simulations are called Quasi-Monte Carlo. Conventional Monte Carlo simulations are increasingly implemented on reconfigurable devices such as FPGAs due to their inherently parallel nature. This has not been possible for Quasi-Monte Carlo simulations because, to our knowledge, no low-discrepancy sequences have been generated in hardware before. We present FPGA-optimized scalable designs to generate three different common low-discrepancy sequences: Sobol, Niederreiter and Halton. We implement these three generators on Virtex-4 FPGAs with varying degrees of fine-grained parallelization, although our ideas can be applied to a far broader class of sequences. We conclude with results from the implementation of an actual Quasi-Monte Carlo simulation for extracting partial inductances from integrated circuits.","PeriodicalId":246715,"journal":{"name":"2008 International Conference on Application-Specific Systems, Architectures and Processors","volume":"587 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113982284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信