{"title":"A scalable, high-performance customized priority queue","authors":"Muhuan Huang, Kevin T. Lim, J. Cong","doi":"10.1109/FPL.2014.6927413","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927413","url":null,"abstract":"Priority queues are abstract data structures where each element is associated with a priority, and the highest priority element is always retrieved first from the queue. The data structure is widely used within databases, including the last stage of a merge-sort, forecasting read-ahead I/O to stream data for the merge-sort, and replacement selection sort. Typical software implementations use a balanced binary tree-based structure, providing O(log N) time for both enqueue and dequeue operations. To improve the performance, we propose several scalable and high-speed FPGA-based implementations of a priority queue. Our insight is that the above listed applications primarily use priority queues through “replace” operations, which remove the highest priority element and place a new element into the queue. Thus, our designs are customized for this operation, allowing for a simple and scalable architecture. We implement three priority queue designs, including use of a register-based array, register-based tree, and BRAM-based tree, which have different benefits and trade-offs of throughput, frequency, and maximum size. More importantly, all designs achieve O(1) time between replace operations. To incorporate the best aspects of our designs, we propose a Hybrid Priority Queue (H-PQ), which combines a register-based array with multiple BRAM-based trees. This design provides, on average, very fast access times to the top items in the queue (through the register-based array), while scaling to large priority queue sizes (through the BRAM-based trees). In our evaluations, we find that H-PQ achieves 4.3x speedup and 21.5x energy efficiency, compared with the Xeon CPU implementations.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125433827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust and flexible FPGA-based digital PUF","authors":"T. Xu, M. Potkonjak","doi":"10.1109/FPL.2014.6927449","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927449","url":null,"abstract":"We have developed the first FPGA-based digital physical unclonable function (PUF) by leveraging the reconfigurability of an FPGA and introducing a new way of using the standard analog delay PUF. The key observation is that for any analog delay PUF, there is a subset of challenge inputs for which the PUF output is stable regardless of operation and environmental conditions. We use only such stable inputs to initialize the look-up tables (LUTs) that are configured in such a way that the digital PUF is formed. We demonstrate the effectiveness of the new security primitive using extensive simulation and experimental results. For example, we show that the new PUF is resistant against a wide spectrum of security attacks and its output stream passes all the NIST randomness tests.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115063911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A secure and unclonable embedded system using instruction-level PUF authentication","authors":"J. Zheng, Dongfang Li, M. Potkonjak","doi":"10.1109/FPL.2014.6927428","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927428","url":null,"abstract":"In this paper we present a secure and unclonable embedded system design that can target either an FPGA or an ASIC technology. The premise of the security is that the executed machine code and the executing environment (the embedded processor) will authenticate each other at a per-instruction basis using Physical Unclonable Functions (PUFs) that are built into the processor. The PUFs ensure that the execution of the binary code may only proceed if the binary is compiled with the correct intrinsic knowledge of the PUFs, and that such intrinsic knowledge is virtually unique to each processor and therefore unclonable. We will explain how to implement and integrate the PUFs into the processor's execution environment such that each instruction is authenticated and de-obfuscated on-demand and how to transform an ordinary binary executable into PUF-aware, obfuscated binaries. We will also present a prototype system on a Xilinx Spartan6-based FPGA board.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114434237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanbiao Li, Dafang Zhang, Xian Yu, W. Liang, Jing Long, Hong Qiao
{"title":"Accelerate NDN name lookup using FPGA: Challenges and a scalable approach","authors":"Yanbiao Li, Dafang Zhang, Xian Yu, W. Liang, Jing Long, Hong Qiao","doi":"10.1109/FPL.2014.6927403","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927403","url":null,"abstract":"Recently, Graphic Processing Units (GPUs) have been shown to be of value in supporting wire-speed name lookup in Named Data Networking (NDN). However, due to the computing model on GPU, the lookup latency is not so encouraging. In this paper, we shift the focus from GPU to Field-Programmable Gate Arrays (FPGA). We highlight three key challenges in accelerating name lookup using FPGA, and then present a scalable approach to address them. In our approach, a hierarchical and compact data structure is proposed to represent the name trie, which achieves not only effective pipeline mapping but also high memory efficiency. Further, it is finally implemented as a linear pipeline on the FPGA platform, enabling both fast lookup speed and low lookup latency. The experimental results show that our approach gains a reduction of memory cost over 90% compared with the referred GPU-based solution. Besides, the lookup throughput of our approach is almost 2.4 times higher, and the latency is up to 3 orders of magnitude lower.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129264720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Polig, K. Atasu, Heiner Giefers, Laura Chiticariu
{"title":"Compiling text analytics queries to FPGAs","authors":"R. Polig, K. Atasu, Heiner Giefers, Laura Chiticariu","doi":"10.1109/FPL.2014.6927500","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927500","url":null,"abstract":"Extracting information from unstructured text data is a compute-intensive task. The performance of general-purpose processors cannot keep up with the rapid growth of textual data. Therefore we discuss the use of FPGAs to perform large scale text analytics. We present a framework consisting of a compiler and an operator library capable of generating a Verilog processing pipeline from a text analytics query specified in the annotation query language AQL. The operator library comprises a set of configurable modules capable of performing relational and extraction tasks which can be assembled by the compiler to represent a full annotation operator graph. Leveraging the nature of text processing we show that most tasks can be performed in an efficient streaming fashion. We evaluate the performance, power consumption and hardware utilization of our approach for a set of different queries compiled to a Stratix IV FPGA. Measurements show an up to 79 times improvement of document-throughput over a 64 threaded software implementation on a POWER7 server. Moreover the accelerated system's energy efficiency is up to 85 times better.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129892739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew Jacobsen, Siddarth Sampangi, Y. Freund, R. Kastner
{"title":"Improving FPGA accelerated tracking with multiple online trained classifiers","authors":"Matthew Jacobsen, Siddarth Sampangi, Y. Freund, R. Kastner","doi":"10.1109/FPL.2014.6927505","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927505","url":null,"abstract":"Robust real time tracking is a requirement for many emerging applications. Many of these applications must track objects even as their appearance changes. Training classifiers online has become an effective approach for dealing with variability in object appearance. Classifiers can learn and adapt to changes online at the cost of additional runtime computation. In this paper, we propose a FPGA accelerated design of an online boosting algorithm that uses multiple classifiers to track and recover objects in real time. Our algorithm uses a novel method for training and comparing pose-specific classifiers along with adaptive tracking classifiers. Our FPGA accelerated design is able to track at 60 frames per second while concurrently evaluating 11 classifiers. This represents a 30× speed up over a CPU based software implementation. It also demonstrates tracking accuracy at state of the art levels on a standard set of videos.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114776529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Athanasios Stratikopoulos, Grigorios Chrysos, I. Papaefstathiou, A. Dollas
{"title":"HPC-gSpan: An FPGA-based parallel system for frequent subgraph mining","authors":"Athanasios Stratikopoulos, Grigorios Chrysos, I. Papaefstathiou, A. Dollas","doi":"10.1109/FPL.2014.6927441","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927441","url":null,"abstract":"Graph mining is an important research area within the domain of data mining. One of the most challenging tasks of graph mining is frequent subgraph mining. This work presents the first FPGA-based implementation, to the best of our knowledge, of the most efficient and well-known algorithm for the Frequent Subgraph Mining (FSM) problem, i.e. gSpan. The proposed system, named High Performance Computing-gSpan (HPC-gSpan), achieves manyfold speedup vs. the official software solution of the gboost library when executed on a high-end CPU for various real-world datasets.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124064940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Dynamic On-chip Memory Management for FPGA-based reconfigurable architectures","authors":"Ghada Dessouky, M. Klaiber, D. Bailey, S. Simon","doi":"10.1109/FPL.2014.6927471","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927471","url":null,"abstract":"In this paper, an adaptive architecture for dynamic management and allocation of on-chip FPGA Block Random Access Memory (BRAM) resources is presented. This facilitates the dynamic sharing of valuable and scarce on-chip memory among several processing elements (PEs), according to their dynamic run-time memory requirements. Different real-time applications are becoming increasingly dynamic which leads to unexpected and variable memory footprints, and static allocation of the worst-case memory requirements would result in costly overheads and inefficient memory utilization. The proposed scalable BRAM memory management architecture adaptively manages these dynamic memory requirements and balances the buffer memory over several PEs to reduce the total memory required, compared to the worst-case memory footprint for all PEs. The run-time adaptive system allocates BRAM to each PE sufficiently fast enough as required and utilized. In a case study, a significant improvement in BRAM utilization with limited overhead has been achieved due to the adaptive memory management architecture. The proposed system supports different BRAM types and configurations, and automated dynamic allocation and deallocation of BRAM resources, and is therefore well suited for the dynamic memory footprints of FPGA-based reconfigurable architectures.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124093483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. George, HyoukJoong Lee, D. Novo, Tiark Rompf, Kevin J. Brown, Arvind K. Sujeeth, Martin Odersky, K. Olukotun, P. Ienne
{"title":"Hardware system synthesis from Domain-Specific Languages","authors":"N. George, HyoukJoong Lee, D. Novo, Tiark Rompf, Kevin J. Brown, Arvind K. Sujeeth, Martin Odersky, K. Olukotun, P. Ienne","doi":"10.1109/FPL.2014.6927454","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927454","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) are very versatile devices, but their complicated programming model has stymied their widespread usage. While modern High-Level Synthesis (HLS) tools provide better programming models, the interface they offer is still too low-level. In order to produce good quality hardware designs with these tools, the users are forced to manually perform optimizations that demand detailed knowledge of both the application and the implementation platform. Additionally, many HLS tools only generate isolated hardware modules that the user still needs to integrate into a system design before generating the FPGA bitstream. These problems make HLS tools difficult to use for application developers who have little hardware design knowledge. To address these problems, we propose an automated methodology to generate FPGA bitstreams from high-level programs written in Domain-Specific Languages (DSLs). We leverage the domain-knowledge conveyed by the DSL and its domain-specific semantics to extract application parallelism, perform optimizations and also identify a suitable system-architecture for the implementation, thereby, relieving the user from most of the hardware-level details. We demonstrate the high productivity and high design quality this approach offers by automatically generating hardware systems from applications written in OptiML, a machine-learning DSL. To evaluate our methodology, we use four OptiML applications and show that we can easily generate different solutions which achieve different trade-offs between performance and area. More importantly, the results reveal that our generated hardware achieves much better performance compared to the one obtained from using the HLS tool without platform-specific optimizations.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128190151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"flipSyrup: Cycle-accurate hardware simulation framework on abstract FPGA platforms","authors":"Shinya Takamaeda-Yamazaki, Kenji Kise","doi":"10.1109/FPL.2014.6927436","DOIUrl":"https://doi.org/10.1109/FPL.2014.6927436","url":null,"abstract":"FPGA-based rapid prototyping is widely applied for fast simulations of hardware structure verifications. In this paper, we propose flipSyrup, a prototyping framework for cycle-accurate hardware simulations on abstract FPGA platforms. In order to mitigate the development complexity of FPGA-based simulators, the framework provides two abstractions of resources on FPGA platforms: Memory systems and inter-FPGA interconnections on multi-FPGA platforms. The framework enables designers to draw up a target hardware using abstract interfaces as ideal memory systems and interconnections on FPGA platforms. Our evaluation result shows that the slowdowns in simulation speed under the abstractions by using the framework are not critical.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134056135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}