2014 24th International Conference on Field Programmable Logic and Applications (FPL)最新文献

A scalable, high-performance customized priority queue 可伸缩的高性能定制优先级队列

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI: 10.1109/FPL.2014.6927413

Muhuan Huang, Kevin T. Lim, J. Cong

{"title":"A scalable, high-performance customized priority queue","authors":"Muhuan Huang, Kevin T. Lim, J. Cong","doi":"10.1109/FPL.2014.6927413","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927413","url":null,"abstract":"Priority queues are abstract data structures where each element is associated with a priority, and the highest priority element is always retrieved first from the queue. The data structure is widely used within databases, including the last stage of a merge-sort, forecasting read-ahead I/O to stream data for the merge-sort, and replacement selection sort. Typical software implementations use a balanced binary tree-based structure, providing O(log N) time for both enqueue and dequeue operations. To improve the performance, we propose several scalable and high-speed FPGA-based implementations of a priority queue. Our insight is that the above listed applications primarily use priority queues through “replace” operations, which remove the highest priority element and place a new element into the queue. Thus, our designs are customized for this operation, allowing for a simple and scalable architecture. We implement three priority queue designs, including use of a register-based array, register-based tree, and BRAM-based tree, which have different benefits and trade-offs of throughput, frequency, and maximum size. More importantly, all designs achieve O(1) time between replace operations. To incorporate the best aspects of our designs, we propose a Hybrid Priority Queue (H-PQ), which combines a register-based array with multiple BRAM-based trees. This design provides, on average, very fast access times to the top items in the queue (through the register-based array), while scaling to large priority queue sizes (through the BRAM-based trees). In our evaluations, we find that H-PQ achieves 4.3x speedup and 21.5x energy efficiency, compared with the Xeon CPU implementations.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125433827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Robust and flexible FPGA-based digital PUF 稳健灵活的基于fpga的数字PUF

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI: 10.1109/FPL.2014.6927449

T. Xu, M. Potkonjak

引用次数: 51

A secure and unclonable embedded system using instruction-level PUF authentication 使用指令级PUF认证的安全且不可克隆的嵌入式系统

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI: 10.1109/FPL.2014.6927428

J. Zheng, Dongfang Li, M. Potkonjak

引用次数: 13

Accelerate NDN name lookup using FPGA: Challenges and a scalable approach 使用FPGA加速NDN名称查找:挑战和可扩展方法

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI: 10.1109/FPL.2014.6927403

Yanbiao Li, Dafang Zhang, Xian Yu, W. Liang, Jing Long, Hong Qiao

引用次数: 10

Compiling text analytics queries to FPGAs 编译文本分析查询到fpga

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI: 10.1109/FPL.2014.6927500

R. Polig, K. Atasu, Heiner Giefers, Laura Chiticariu

{"title":"Compiling text analytics queries to FPGAs","authors":"R. Polig, K. Atasu, Heiner Giefers, Laura Chiticariu","doi":"10.1109/FPL.2014.6927500","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927500","url":null,"abstract":"Extracting information from unstructured text data is a compute-intensive task. The performance of general-purpose processors cannot keep up with the rapid growth of textual data. Therefore we discuss the use of FPGAs to perform large scale text analytics. We present a framework consisting of a compiler and an operator library capable of generating a Verilog processing pipeline from a text analytics query specified in the annotation query language AQL. The operator library comprises a set of configurable modules capable of performing relational and extraction tasks which can be assembled by the compiler to represent a full annotation operator graph. Leveraging the nature of text processing we show that most tasks can be performed in an efficient streaming fashion. We evaluate the performance, power consumption and hardware utilization of our approach for a set of different queries compiled to a Stratix IV FPGA. Measurements show an up to 79 times improvement of document-throughput over a 64 threaded software implementation on a POWER7 server. Moreover the accelerated system's energy efficiency is up to 85 times better.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129892739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Improving FPGA accelerated tracking with multiple online trained classifiers 利用多个在线训练分类器改进FPGA加速跟踪

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI: 10.1109/FPL.2014.6927505

Matthew Jacobsen, Siddarth Sampangi, Y. Freund, R. Kastner

引用次数: 3

HPC-gSpan: An FPGA-based parallel system for frequent subgraph mining HPC-gSpan:基于fpga的频繁子图挖掘并行系统

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI: 10.1109/FPL.2014.6927441

Athanasios Stratikopoulos, Grigorios Chrysos, I. Papaefstathiou, A. Dollas

引用次数: 11

Adaptive Dynamic On-chip Memory Management for FPGA-based reconfigurable architectures 基于fpga的可重构结构的自适应动态片上存储器管理

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI: 10.1109/FPL.2014.6927471

Ghada Dessouky, M. Klaiber, D. Bailey, S. Simon

{"title":"Adaptive Dynamic On-chip Memory Management for FPGA-based reconfigurable architectures","authors":"Ghada Dessouky, M. Klaiber, D. Bailey, S. Simon","doi":"10.1109/FPL.2014.6927471","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927471","url":null,"abstract":"In this paper, an adaptive architecture for dynamic management and allocation of on-chip FPGA Block Random Access Memory (BRAM) resources is presented. This facilitates the dynamic sharing of valuable and scarce on-chip memory among several processing elements (PEs), according to their dynamic run-time memory requirements. Different real-time applications are becoming increasingly dynamic which leads to unexpected and variable memory footprints, and static allocation of the worst-case memory requirements would result in costly overheads and inefficient memory utilization. The proposed scalable BRAM memory management architecture adaptively manages these dynamic memory requirements and balances the buffer memory over several PEs to reduce the total memory required, compared to the worst-case memory footprint for all PEs. The run-time adaptive system allocates BRAM to each PE sufficiently fast enough as required and utilized. In a case study, a significant improvement in BRAM utilization with limited overhead has been achieved due to the adaptive memory management architecture. The proposed system supports different BRAM types and configurations, and automated dynamic allocation and deallocation of BRAM resources, and is therefore well suited for the dynamic memory footprints of FPGA-based reconfigurable architectures.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124093483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Hardware system synthesis from Domain-Specific Languages 基于领域特定语言的硬件系统综合

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI: 10.1109/FPL.2014.6927454

N. George, HyoukJoong Lee, D. Novo, Tiark Rompf, Kevin J. Brown, Arvind K. Sujeeth, Martin Odersky, K. Olukotun, P. Ienne

{"title":"Hardware system synthesis from Domain-Specific Languages","authors":"N. George, HyoukJoong Lee, D. Novo, Tiark Rompf, Kevin J. Brown, Arvind K. Sujeeth, Martin Odersky, K. Olukotun, P. Ienne","doi":"10.1109/FPL.2014.6927454","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927454","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) are very versatile devices, but their complicated programming model has stymied their widespread usage. While modern High-Level Synthesis (HLS) tools provide better programming models, the interface they offer is still too low-level. In order to produce good quality hardware designs with these tools, the users are forced to manually perform optimizations that demand detailed knowledge of both the application and the implementation platform. Additionally, many HLS tools only generate isolated hardware modules that the user still needs to integrate into a system design before generating the FPGA bitstream. These problems make HLS tools difficult to use for application developers who have little hardware design knowledge. To address these problems, we propose an automated methodology to generate FPGA bitstreams from high-level programs written in Domain-Specific Languages (DSLs). We leverage the domain-knowledge conveyed by the DSL and its domain-specific semantics to extract application parallelism, perform optimizations and also identify a suitable system-architecture for the implementation, thereby, relieving the user from most of the hardware-level details. We demonstrate the high productivity and high design quality this approach offers by automatically generating hardware systems from applications written in OptiML, a machine-learning DSL. To evaluate our methodology, we use four OptiML applications and show that we can easily generate different solutions which achieve different trade-offs between performance and area. More importantly, the results reveal that our generated hardware achieves much better performance compared to the one obtained from using the HLS tool without platform-specific optimizations.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128190151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

flipSyrup: Cycle-accurate hardware simulation framework on abstract FPGA platforms flipSyrup:基于抽象FPGA平台的周期精确硬件仿真框架

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI: 10.1109/FPL.2014.6927436

Shinya Takamaeda-Yamazaki, Kenji Kise

引用次数: 4