2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors最新文献

筛选
英文 中文
Accelerating a Virtual Ecology Model with FPGAs 用fpga加速虚拟生态模型
J. Lamoureux, T. Field, W. Luk
{"title":"Accelerating a Virtual Ecology Model with FPGAs","authors":"J. Lamoureux, T. Field, W. Luk","doi":"10.1109/ASAP.2009.27","DOIUrl":"https://doi.org/10.1109/ASAP.2009.27","url":null,"abstract":"This paper describes the acceleration of virtual ecology models using field-programmable gate arrays (FPGAs). Our approach targets models generated by the Virtual Ecology Workbench (VEW); an existing tool used by biological oceanographers to build and analyze models of the plankton ecosystem in the upper ocean. Depending on the plankton study and required level of detail, the logic, memory, and data transfer requirements of the generated models can vary significantly. Using FPGAs, hardware implementations can be customized to the specific requirements of the ecological system under study and provide significant speed-ups compared to software implementations. This paper describes a framework for maximizing the speedup of VEW generated models implemented on FPGA-based acceleration platforms and then describes the implementation of a typical VEW generated model to validate the framework and demonstrate that significant speedups are possible. Based on timing and area estimates from a commercial synthesis tool, the example model implemented on a Celoxica RCHTX acceleration board featuring a Xilinx Virtex-4 FPGA performs 39 times faster at 150 MHz than the software implementation on an AMD Opteron 2200 series CPU at 1.0 GHz.","PeriodicalId":202421,"journal":{"name":"2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128600422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A 16-context Optically Reconfigurable Gate Array 一个16上下文光可重构门阵列
M. Nakajima, Minoru Watanabe
{"title":"A 16-context Optically Reconfigurable Gate Array","authors":"M. Nakajima, Minoru Watanabe","doi":"10.1109/ASAP.2009.41","DOIUrl":"https://doi.org/10.1109/ASAP.2009.41","url":null,"abstract":"Demand for fast dynamic reconfiguration has increased since dynamic reconfiguration can accelerate the performance of processors. Dynamic reconfiguration has two important prerequisites: fast reconfiguration and numerous reconfiguration contexts. Unfortunately, fast reconfigurations and numerous contexts share a tradeoff relation on current VLSIs. Therefore, optically reconfigurable gate arrays were developed to resolve this dilemma. Optically reconfigurable gate arrays can realize a large virtual gate count that is much larger than those of current VLSI chips by exploiting the large storage capacity of a holographic memory. Furthermore, optically reconfigurable gate arrays can realize rapid reconfiguration using large bandwidth optical connections between a holographic memory and a programmable gate array VLSI. This paper presents the fastest 317–657 ns reconfiguration demonstration of a 16-context optically reconfigurable gate array architecture.","PeriodicalId":202421,"journal":{"name":"2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129686333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Reconfigurable SWP Operator for Multimedia Processing 用于多媒体处理的可重构SWP操作符
Shafqat Khan, E. Casseau, D. Ménard
{"title":"Reconfigurable SWP Operator for Multimedia Processing","authors":"Shafqat Khan, E. Casseau, D. Ménard","doi":"10.1109/ASAP.2009.13","DOIUrl":"https://doi.org/10.1109/ASAP.2009.13","url":null,"abstract":"For performance enhancement, reconfigurable processors have to overcome the overheads of reconfigurations such as the complexity of the interconnection network and reconfiguration time. In processors dealing with multimedia applications these overheads can be reduced by providing the reconfigurability inside the processing units rather than at interconnection level. Due to the low precision data nature of multimedia applications, reconfiguration at operator level also provides additional speedup through parallel execution of low precision data. In this paper a pipelined architecture of a reconfigurable coarse grain subword parallel (SWP) operator is presented for multimedia applications. This operator not only eliminates the need of reconfiguration time but also provides the reconfigurability at both data size level (different pixel data sizes) and at operation level (different multimedia oriented operations). This ensures a better utilization of the processor resources and reduces the reconfiguration overheads significantly.","PeriodicalId":202421,"journal":{"name":"2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127367751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Parallelized Architecture of Multiple Classifiers for Face Detection 多分类器在人脸检测中的并行化结构
Junguk Cho, Bridget Benson, Shahnam Mirzaei, R. Kastner
{"title":"Parallelized Architecture of Multiple Classifiers for Face Detection","authors":"Junguk Cho, Bridget Benson, Shahnam Mirzaei, R. Kastner","doi":"10.1109/ASAP.2009.38","DOIUrl":"https://doi.org/10.1109/ASAP.2009.38","url":null,"abstract":"This paper presents a parallelized architecture of multiple classifiers for face detection based on the Viola and Jones object detection method. This method makes use of the AdaBoost algorithm which identifies a sequence of Haar classifiers that indicate the presence of a face. We describe the hardware design techniques including image scaling, integral image generation, pipelined processing of classifiers, and parallel processing of multiple classifiers to accelerate the processing speed of the face detection system. Also we discuss the parallelized architecture which can be scalable for configurable device with variable resources. We implement the proposed architecture in Verilog HDL on a Xilinx Virtex-5 FPGA and show the parallelized architecture of multiple classifiers can have 3.3× performance gain over the architecture of a single classifier and an 84× performance gain over an equivalent software solution.","PeriodicalId":202421,"journal":{"name":"2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126464663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Mapping Parallel FFT Algorithm onto SmartCell Coarse-Grained Reconfigurable Architecture 并行FFT算法在SmartCell粗粒度可重构架构上的映射
C. Liang, Xinming Huang
{"title":"Mapping Parallel FFT Algorithm onto SmartCell Coarse-Grained Reconfigurable Architecture","authors":"C. Liang, Xinming Huang","doi":"10.1109/ASAP.2009.33","DOIUrl":"https://doi.org/10.1109/ASAP.2009.33","url":null,"abstract":"This paper presents the implementation of a novel parallel FFT algorithm on SmartCell, a coarse-grained reconfigurable architecture, which is targeted on data streaming applications. The proposed FFT algorithm achieves balanced workload and memory requirement among the computational units, while maintaining optimized data flow at low configuration and communication cost. The proposed parallel FFT algorithm is then mapped onto the SmartCell prototype device with 64 processing elements. Results show that the parallel FFT implementation on SmartCell is about 14.9 and 2.7 times faster than network-on-chip (NoC) and Morphosys, respectively. The implementation also shows about 3.6 times better energy efficiency when comparing with the pipelined FFT implementations on FPGA.","PeriodicalId":202421,"journal":{"name":"2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129876285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classification 基于fpga的可扩展高速分组分类并行体系结构
Weirong Jiang, V. Prasanna
{"title":"A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classification","authors":"Weirong Jiang, V. Prasanna","doi":"10.1109/ASAP.2009.17","DOIUrl":"https://doi.org/10.1109/ASAP.2009.17","url":null,"abstract":"Multi-field packet classification is a critical function that enables network routers to support a variety of applications such as firewall processing, Quality of Service differentiation, traffic billing, and other value added services. Explosive growth of Internet traffic requires the future packet classifiers be implemented in hardware. However, most of the existing packet classification algorithms need large amount of memory, which inhibits efficient hardware implementations. This paper exploits the modern FPGA technology and presents a partitioning-based parallel architecture for scalable and high-speed packet classification. We propose a coarse-grained independent sets algorithm and then combine it seamlessly with the cross-producting scheme. After partitioning the original rule set into several coarse-grained independent sets and applying the cross-producting scheme for the remaining rules, the memory requirement is dramatically reduced. Our FPGA implementation results show that our architecture can store 10K real-life rules in a single state-of-the-art FPGA while consuming a small amount of on-chip resources. Post place and route results show that the design sustains 90 Gbps throughput for minimum size (40 bytes) packets, which is more than twice the current backbone network link rate.","PeriodicalId":202421,"journal":{"name":"2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132380024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Integral Parallel Architecture & Berkeley's Motifs 整体并行建筑与伯克利的主题
M. Malita, G. Stefan
{"title":"Integral Parallel Architecture & Berkeley's Motifs","authors":"M. Malita, G. Stefan","doi":"10.1109/ASAP.2009.40","DOIUrl":"https://doi.org/10.1109/ASAP.2009.40","url":null,"abstract":"The Integral Parallel Architecture (IPA) developed and actually implemented by BrightScale is a low-power(133 GOPS/Watt) & low-area (8 GOPS/mm^2) one-chip solution to solve intense computational problems using data-parallel, time-parallel and speculative-parallel mechanisms. BrightScale technology is presented from the point of view of each of the 13 motifs proposed in The Berkeley's View. IPA emerges from Kleene's computational model of the partial recursive functions as the simplest parallel architecture, a good starting point for a true science of parallel computation. We briefly investigate how such an elementary parallel architecture performs, for the main computational motifs, in solving the problems of programmability, portability, flexibility, data movement between computational cells, and between cells and the main memory.","PeriodicalId":202421,"journal":{"name":"2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116298651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Application Specific Transistor Sizing for Low Power Full Adders 低功率全加法器专用晶体管尺寸
F. Eslami, A. Baniasadi, Mostafa Farahani
{"title":"Application Specific Transistor Sizing for Low Power Full Adders","authors":"F. Eslami, A. Baniasadi, Mostafa Farahani","doi":"10.1109/ASAP.2009.23","DOIUrl":"https://doi.org/10.1109/ASAP.2009.23","url":null,"abstract":"Previously suggested transistor sizing algorithms assume that all input transitions are equally important. In this work we show that this is not an accurate assumption as input transitions appear in different frequencies. We take advantage from this phenomenon and introduce Application Specific Transistor Sizing. In Application Specific Transistor Sizing higher priority is given to more frequent transitions. We apply our technique to two modern and low-power full adders (i.e., hybrid-CMOS and TFA) and show that it is possible to further reduce power dissipation and PDP. By using our technique we improve average PDP by 6% and 9% for TFA and hybrid-CMOS adders respectively. We reduce ALU energy consumption for ALU designs using TFA and hybrid-CMOS FAs by 2.7% and 4 % respectively.","PeriodicalId":202421,"journal":{"name":"2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121562096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Design and Implementation of a Radix-4 Complex Division Unit with Prescaling 具有预标度的基数-4复除法单元的设计与实现
Pouya Dormiani, M. Ercegovac, J. Muller
{"title":"Design and Implementation of a Radix-4 Complex Division Unit with Prescaling","authors":"Pouya Dormiani, M. Ercegovac, J. Muller","doi":"10.1109/ASAP.2009.32","DOIUrl":"https://doi.org/10.1109/ASAP.2009.32","url":null,"abstract":"We present a design and implementation of a radix-4 complex division unit with prescaling of the operands. Specifically, we extend the treatment of the residual bound and errors due to the use of truncated redundant representation. The requirements for prescaling tables are simplified and a detailed specification of the table design is given. All principal components used in the design are described and the proposed optimizations are explained. The target platform for implementation was an Altera Stratix II FPGA for which we report timing and area requirements. For a precision of 36 bits, the implementation uses 1185 ALUTs, achieving a latency of 157 ns. The maximum clock frequency is 173.49 MHz.","PeriodicalId":202421,"journal":{"name":"2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121654341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
P3FSM: Portable Predictive Pattern Matching Finite State Machine P3FSM:便携式预测模式匹配有限状态机
L. Vespa, Minimol Mathew, N. Weng
{"title":"P3FSM: Portable Predictive Pattern Matching Finite State Machine","authors":"L. Vespa, Minimol Mathew, N. Weng","doi":"10.1109/ASAP.2009.16","DOIUrl":"https://doi.org/10.1109/ASAP.2009.16","url":null,"abstract":"Signature-based network intrusion detection requires fast and reconfigurable pattern matching for deep packet inspection. In our previous work we address this problem with a hardware based pattern matching engine that utilizes a novel state encoding scheme to allow memory efficient use of Deterministic Finite Automata. In this work we expand on these concepts to create a completely software based system, P3FSM, which combines the properties of hardware based systems with the portability and programmability of software. Specifically we introduce two methods, Character Aware and SDFA, for encoding predictive state codes which can forecast the next states of our FSM. The result is software based pattern matching which is fast, reconfigurable, memory-efficient and portable.","PeriodicalId":202421,"journal":{"name":"2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors","volume":"387 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116485025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信