Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献_第2页

High Level Programming of Document Classification Systems for Heterogeneous Environments using OpenCL (Abstract Only) 基于OpenCL的异构环境下文档分类系统的高级编程(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689136

Nasibeh Nasiri, Oren Segal, M. Margala, W. Vanderbauwhede, S. R. Chalamalasetti

{"title":"High Level Programming of Document Classification Systems for Heterogeneous Environments using OpenCL (Abstract Only)","authors":"Nasibeh Nasiri, Oren Segal, M. Margala, W. Vanderbauwhede, S. R. Chalamalasetti","doi":"10.1145/2684746.2689136","DOIUrl":"https://doi.org/10.1145/2684746.2689136","url":null,"abstract":"Document classification is at the heart of several of the applications that have been driving the proliferation of the internet in our daily lives. The ever growing amounts of data and the need for higher throughput, more energy efficient document classification solutions motivated us to investigate alternatives to the traditional homogenous CPU based implementations. We investigate a heterogeneous system where CPUs are combined with FPGAs as system accelerators. Incorporating FPGAs as accelerators in a heterogeneous computing environment allows for the creation of flexible custom hardware solutions that can potentially offer increased power efficiency and performance gains. One of the main issues delaying wide spread adoption of FPGAs as standard heterogeneous system accelerators is the difficulty in programming them. The OpenCL standard offers a unified C programming model for any device that adheres to its standards. An Altera OpenCL FPGA based implementation of a document classification system is investigated in which a stream of HTML documents is scored according to a profile on a document-by-document basis. The results show that the throughput of the document classification application with and without Bloom Filters is 312MB/s and 343MB/s respectively, when running on CPU, and 354MB/s and 452MB/s respectively, when running on an FPGA. Our results also show up to 32% power efficiency improvement for the FPGA implementation over the CPU implementation. We would like to thank Davor Capalija from Altera for his invaluable advice during our work on the FPGA version of the algorithm.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129468389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An Efficient and Flexible FPGA Implementation of a Face Detection System (Abstract Only) 一种高效灵活的FPGA实现人脸检测系统(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689095

Hichem Ben Fekih, A. Elhossini, B. Juurlink

{"title":"An Efficient and Flexible FPGA Implementation of a Face Detection System (Abstract Only)","authors":"Hichem Ben Fekih, A. Elhossini, B. Juurlink","doi":"10.1145/2684746.2689095","DOIUrl":"https://doi.org/10.1145/2684746.2689095","url":null,"abstract":"Robust and rapid face detection systems are constantly gaining more interest, since they represent the first stone for many challenging tasks in the field of computer vision. In this paper a software-hardware co-design approach is presented, that enables the detection of frontal faces in real time. A complete hardware implementation of all components taking part of the face detection is introduced. This work is based on the object detection framework of Viola and Jones, which makes use of a cascade of classifiers to reduce the computation time. The proposed architecture is flexible, as it allows the use of multiple instances of the face detector. This makes developers free to choose the speed range and reserved resources for this task. The current implementation runs on the Zynq SoC and receives images over IP network, which allows exposing the face detection task as a remote service that can be consumed from any device connected to the network. We performed several measurements for the final detector and the software equivalent. Using three Evaluator cores, the ZedBoard system achieves a maximal average frame rate of 13.4 FPS when analysing an image containing 640x480 pixels. This stands for an improvement of 5.25 times compared to the software solution and represents acceptable results for most real-time systems. On the ZC706 system, a higher frame rate of 16.58 FPS is achieved. The proposed hardware solution achieved 92% accuracy, which is low compared to the software solution (97%) due to different scaling algorithm. The proposed solution achieved higher frame rate compared to other solutions found in the literature.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130651399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Delay-Bounded Routing for Shadow Registers 阴影寄存器的延迟有界路由

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689075

Eddie Hung, Joshua M. Levine, Edward A. Stott, G. Constantinides, W. Luk

引用次数: 3

Platform-Independent Gigabit Communication for Low-Cost FPGAs (Abstract Only) 低成本fpga的独立平台千兆通信(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689150

R. Salomon, R. Joost, Matthias Hinkfoth

{"title":"Platform-Independent Gigabit Communication for Low-Cost FPGAs (Abstract Only)","authors":"R. Salomon, R. Joost, Matthias Hinkfoth","doi":"10.1145/2684746.2689150","DOIUrl":"https://doi.org/10.1145/2684746.2689150","url":null,"abstract":"Among other things, field-programmable gate arrays (FPGAs) available today contain numerous bit-serial transceivers for communication purposes. Unlike analog modulation schemes, such as quadrature amplitude modulation, bit-serial communication is relatively easy to implement in digital hardware, and is thus usually used for inter FPGA communication. In this view, only the data rate and frequency limit the bandwidth of the circuit. In order to overcome the bandwidth limit, this research proposes a pulse-width modulation (PWM) scheme for data transmission. The information is coded by modulating the length of the high and low voltage parts of the pulse. Although this approach is not new, existing PWM modulators have unsatisfactorial data rates due to their synchronous implementation nature. Therefore, this research implements both the modulator and demodulator by using asynchronous logic. The result is a proof-of-concept comprising two Terasic DE2-70 development boards and a 1 m coaxial cable. Both the PWM modulator and demodulator run at 333 MHz, and pulses are transmitted every 3 ns. Each pulse carries 3 to 4 bits of data. The experimental results indicate an achievable data rate of one gigabit per second, which is about 50 % larger than the FPGA's handbook states.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131975091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

System-level Linking of Synthesised Hardware and Compiled Software Using a Higher-order Type System 用高阶系统实现综合硬件与编译软件的系统级连接

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689089

Shane T. Fleming, David B. Thomas, G. Constantinides, D. Ghica

引用次数: 2

Real-Time Obstacle Avoidance for Mobile Robots via Stereoscopic Vision Using Reconfigurable Hardware (Abstract Only) 基于可重构硬件的立体视觉移动机器人实时避障研究(摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689099

Martinianos Papadopoulos, Christos Ttofis, C. Kyrkou, T. Theocharides

{"title":"Real-Time Obstacle Avoidance for Mobile Robots via Stereoscopic Vision Using Reconfigurable Hardware (Abstract Only)","authors":"Martinianos Papadopoulos, Christos Ttofis, C. Kyrkou, T. Theocharides","doi":"10.1145/2684746.2689099","DOIUrl":"https://doi.org/10.1145/2684746.2689099","url":null,"abstract":"An embedded, real-time, and low power obstacle avoidance system is a critical component towards fully autonomous robots that can be used in safety missions, space exploration, and transportation systems among others. In this paper a complete prototyping platform for the evaluation of obstacle avoidance systems and autonomous robots is realized on reconfigurable hardware. An efficient stereo vision algorithm for producing the necessary 3D and an obstacle avoidance subsystem were both implemented on an ATLYS Spartan-6 FPGA board equipped with a VmodCam stereo camera module. A modified FDX Vantage 1/10 electric car platform was used for testing the proposed architecture in indoor and outdoor real-world scenes. The system receives stereo image data from the VmodCam module and a decision-making algorithm is applied on a specified Region of Interest (RoI) on the produced disparity map. The algorithm outputs the direction that the robot should move to in order to avoid any obstacles present. Experimental evaluation results indicate that the FPGA-based robotic platform can avoid obstacles in real-time (i.e. can process and identify obstacles within a 1/30th of a second that a stereo image takes to be processed) in both indoor and outdoor environments, with 91.7% accuracy, equivalent to software implementations. The overall power consumption of the proposed architecture, excluding the electronic car platform, is 6 W, making it ideal for use on mobile robots, without becoming a significant drain on its battery life.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133378332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

300 Thousand Gates Single Event Effect Hardened SRAM-based FPGA for Space Application (Abstract Only) 基于sram的30万门单事件效应强化FPGA空间应用(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689120

Lei Chen, Yuanfu Zhao, Zhiping Wen, Jing Zhou, Xuewu Li, Yanlong Zhang, Huabo Sun

{"title":"300 Thousand Gates Single Event Effect Hardened SRAM-based FPGA for Space Application (Abstract Only)","authors":"Lei Chen, Yuanfu Zhao, Zhiping Wen, Jing Zhou, Xuewu Li, Yanlong Zhang, Huabo Sun","doi":"10.1145/2684746.2689120","DOIUrl":"https://doi.org/10.1145/2684746.2689120","url":null,"abstract":"SRAM-based FPGAs have been widely used in space engineering. However, the configuration memory in SRAM-based FPGA is susceptible to the single event effects (SEE). It can disrupt the communication or control functions of the spacecraft. To mitigate SEE effects of the SRAM-based FPGAs used in space radiation environment, Beijing Microelectronics Technology Institute (BMTI) developed a 300 thousand gates Single Event Effect hardened SRAM-based FPGA -- BQVR300RH. The BQVR300RH employs Radiation Harden by Design (RHBD) technique. Hardened standard cell library based on Adaptive SRAM (ASRAM) structure is established. For especially sensitive and important resource, other assistant techniques are also adopted. The experiment results show that the BQVR300RH improved the anti-SEU characteristic a lot, compared with Xilinx 300 thousand gates space-grade SRAM-based FPGA (XQVR300). The SEU threshold of BQVR300RH is 19.06 MeV⋅cm2/mg. The anti-SEU characteristic improves three orders of magnitude than XQVR300. The improvement of anti-SEU behavior expands the usage of SRAM-based FPGA in aerospace applications. Currently, BQVR300RH has been used in space field in China.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124405781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA FPGA上双音排序的能量和内存效率映射

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689068

Ren Chen, Sruja Siriyal, V. Prasanna

{"title":"Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA","authors":"Ren Chen, Sruja Siriyal, V. Prasanna","doi":"10.1145/2684746.2689068","DOIUrl":"https://doi.org/10.1145/2684746.2689068","url":null,"abstract":"Parallel sorting networks are widely employed in hardware implementations for sorting due to their high data parallelism and low control overhead. In this paper, we propose an energy and memory efficient mapping methodology for implementing bitonic sorting network on FPGA. Using this methodology, the proposed sorting architecture can be built for a given data parallelism while supporting continuous data streams. We propose a streaming permutation network (SPN) by \"folding\" the classic Clos network. We prove that the SPN is programmable to realize all the interconnection patterns in the bitonic sorting network. A low cost design for sorting with minimal resource usage is obtained by reusing one SPN . We also demonstrate a high throughput design by trading off area for performance. With a data parallelism of p (2 ≤ p ≤ N/ log2 N), the high throughput design sorts an N-key sequence with latency O(N/p), throughput (# of keys sorted per cycle) O(p) and uses O(N) memory. This achieves optimal memory efficiency (defined as the ratio of throughput to the amount of on-chip memory used by the design) of O(p/N). Another noteworthy feature of the high throughput design is that only single-port memory rather than dual-port memory is required for processing continuous data streams. This results in 50% reduction in memory consumption. Post place-and-route results show that our architecture demonstrates 1.3x ∼1.6x improvment in energy efficiency and 1.5x ∼ 5.3x better memory efficiency compared with the state-of-the-art designs.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114975767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Sequence-based In-Circuit Breakpoints for Post-Silicon Debug (Abstract Only) 基于顺序的电路中断点后硅调试(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689102

Y. Tamiya, Yoshinori Tomita, Toshiyuki Ichiba, Kaoru Kawamura

{"title":"Sequence-based In-Circuit Breakpoints for Post-Silicon Debug (Abstract Only)","authors":"Y. Tamiya, Yoshinori Tomita, Toshiyuki Ichiba, Kaoru Kawamura","doi":"10.1145/2684746.2689102","DOIUrl":"https://doi.org/10.1145/2684746.2689102","url":null,"abstract":"Recently, simulation and/or formal verification in pre-silicon verification cannot accomplish the whole system-level verification with exhaustive input data and run-time because of lack of sufficient speed and logic capacities. Consequently, post-silicon validation, such as in-circuit debugging, becomes increasingly important. In this paper we propose a novel breakpoint mechanism, which improves controllability of in-circuit debugging. Our contributions are summarized as follows: (1) A basic concept of a new breakpoint method is proposed, which stops the target hardware by detecting a data sequence of arbitrary length, (2) The breakpoint is shown to be implemented in an efficient pipelined hardware, which works \"at-speed\", in realtime and with small area overheads using CRC (Cyclic Redundancy Check), and (3) Our experimental results of detecting a data sequence in a pseudo random stream data shows that false positives can be suppressed by the CRC width and the number of sub-sequences. Since changing breakpoint conditions does not require re-implementation of the hardware, it is expected to reduce much debugging effort in post-silicon validation.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126171855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Session details: Technical Session 8: Applications 技术部分8:应用

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/3251658

K. Bazargan

引用次数: 0